INN Hotels Project

Submitted by Neha Biswas

Context

A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

The cancellation of bookings impact a hotel on various fronts:

  • Loss of resources (revenue) when the hotel cannot resell the room.
  • Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.
  • Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.
  • Human resources to make arrangements for the guests.

Objective

The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.

Data Description

The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.

Data Dictionary

  • Booking_ID: unique identifier of each booking
  • no_of_adults: Number of adults
  • no_of_children: Number of Children
  • no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
  • no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
  • type_of_meal_plan: Type of meal plan booked by the customer:
    • Not Selected – No meal plan selected
    • Meal Plan 1 – Breakfast
    • Meal Plan 2 – Half board (breakfast and one other meal)
    • Meal Plan 3 – Full board (breakfast, lunch, and dinner)
  • required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
  • room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
  • lead_time: Number of days between the date of booking and the arrival date
  • arrival_year: Year of arrival date
  • arrival_month: Month of arrival date
  • arrival_date: Date of the month
  • market_segment_type: Market segment designation.
  • repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
  • no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
  • no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
  • avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
  • no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
  • booking_status: Flag indicating if the booking was canceled or not.

Leading Questions:

  1. What are the busiest months in the hotel?
  2. Which market segment do most of the guests come from?
  3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
  4. What percentage of bookings are canceled?
  5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
  6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?

Importing necessary libraries and data

In [1]:
# Importing libraries for reading the data manipulation:
import numpy as np
import pandas as pd

# Importing libraries for data visualization:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [2]:
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
In [3]:
# Removes the limit for the number of displayed columns:
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows: 
pd.set_option("display.max_rows", 200)
# setting the precision of floating numbers to 5 decimal points: 
pd.set_option("display.float_format", lambda x: "%.5f" % x)
In [4]:
# To build logistic regression model using statsmodels:
import statsmodels.api as sm
# Importing function train_test_split to split the data into train and test:
from sklearn.model_selection import train_test_split
# Importing function variance_inflation_factor to compute VIF:
from statsmodels.stats.outliers_influence import variance_inflation_factor
In [5]:
# Metric Scores : to check model performance
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    roc_auc_score,
    precision_recall_curve,
    roc_curve,
    make_scorer
)
In [6]:
# To build decision tree model:
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
In [7]:
# To tune different models
from sklearn.model_selection import GridSearchCV

Function Definition for different functions used

In [8]:
# Function to create histogram and box plots:
def creating_hist_box(df, feature, kde= True, bins=None, figsize =(10, 4)):
      f2, (ax_hist, ax_box) = plt.subplots(nrows=1, ncols=2, figsize=figsize)
      f2.tight_layout(pad=5.0)

      if bins: 
        sns.histplot(data=df, x=feature, kde=kde, ax=ax_hist, bins=bins) 
        ax_hist.set_title(f'Histogram with bins = {bins}')
      else:
        sns.histplot(data=df, x=feature, kde=kde, ax=ax_hist)
        ax_hist.set_title(f'Histogram with default no of bins.')
      
      sns.boxplot(data=df, x=feature, ax=ax_box, showmeans=True, color="violet") 
      ax_box.set_title('Boxplot')
In [9]:
# Function to create labeled barplots:
def labeled_barplot(data, feature, perc=False, n=None, rotatn=0):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    rotatn: how to display x labels (default is horizontally(0), vertically(90))
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 2, 6))
    else:
        plt.figure(figsize=(n + 2, 6))

    plt.xticks(rotation=rotatn,fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n],
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot
        

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [10]:
# Function to create stacked barplots:
def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()
In [11]:
# Function to plot distributions wrt target:
def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
        stat="density",
    )

    axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
        stat="density",
    )

    axs[1, 0].set_title("Boxplot w.r.t target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
    )

    plt.tight_layout()
    plt.show()
In [12]:
# Function to compute different metrics to check the performance of a classification model built using statsmodels:
def model_performance_classification_statsmodels(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred_temp = model.predict(predictors) > threshold
    # rounding off the above values to get classes
    pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf
In [13]:
# Function to plot the confusion_matrix of a classification model:
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """
    y_pred = model.predict(predictors) > threshold
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
In [14]:
# Function to calculate the variance inflation factor:
def checking_vif(predictors):
    vif = pd.DataFrame()
    vif["feature"] = predictors.columns

    # calculating VIF for each feature
    vif["VIF"] = [
        variance_inflation_factor(predictors.values, i)
        for i in range(len(predictors.columns))
    ]
    return vif
In [15]:
# Function to compute different metrics to check performance of a classification model built using sklearn:
def model_performance_classification_sklearn(model, predictors, target):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    """

    # predicting using the independent variables
    pred = model.predict(predictors)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf
In [16]:
# defining a function to plot the confusion_matrix of a dtree model:
def confusion_matrix_sklearn(model, predictors, target):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    """
    y_pred = model.predict(predictors)
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

Data Overview

  • Observations
  • Sanity checks
In [17]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [18]:
# Loading the data:
data = pd.read_csv('/content/drive/MyDrive/Univ_Texas/Supervised_Learning_Classification/Project/INNHotelsGroup.csv')
In [19]:
# Displaying first 5 rows of the dataset: 
data.head()
Out[19]:
Booking_ID no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 INN00001 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 Not_Canceled
1 INN00002 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 Not_Canceled
2 INN00003 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00000 0 Canceled
3 INN00004 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00000 0 Canceled
4 INN00005 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 Canceled
In [20]:
# Displaying the last 5 rows of the dataset: 
data.tail()
Out[20]:
Booking_ID no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
36270 INN36271 3 0 2 6 Meal Plan 1 0 Room_Type 4 85 2018 8 3 Online 0 0 0 167.80000 1 Not_Canceled
36271 INN36272 2 0 1 3 Meal Plan 1 0 Room_Type 1 228 2018 10 17 Online 0 0 0 90.95000 2 Canceled
36272 INN36273 2 0 2 6 Meal Plan 1 0 Room_Type 1 148 2018 7 1 Online 0 0 0 98.39000 2 Not_Canceled
36273 INN36274 2 0 0 3 Not Selected 0 Room_Type 1 63 2018 4 21 Online 0 0 0 94.50000 0 Canceled
36274 INN36275 2 0 1 2 Meal Plan 1 0 Room_Type 1 207 2018 12 30 Offline 0 0 0 161.67000 0 Not_Canceled

Observation:

  • The datafame has 19 columns. Each row in the data dataset represents a hotel room booking made by a customer.
In [21]:
# Checking shape of the dataset: 
data.shape
print(f'No. of rows in the dataset: {data.shape[0]}')
print(f'No. of columns in the dataset: {data.shape[1]}')
No. of rows in the dataset: 36275
No. of columns in the dataset: 19

Observation:

  • There are 36275 rows and 19 columns in the dataset.
In [22]:
# Info table: Checking the datatypes of the columns for the dataset:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36275 entries, 0 to 36274
Data columns (total 19 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Booking_ID                            36275 non-null  object 
 1   no_of_adults                          36275 non-null  int64  
 2   no_of_children                        36275 non-null  int64  
 3   no_of_weekend_nights                  36275 non-null  int64  
 4   no_of_week_nights                     36275 non-null  int64  
 5   type_of_meal_plan                     36275 non-null  object 
 6   required_car_parking_space            36275 non-null  int64  
 7   room_type_reserved                    36275 non-null  object 
 8   lead_time                             36275 non-null  int64  
 9   arrival_year                          36275 non-null  int64  
 10  arrival_month                         36275 non-null  int64  
 11  arrival_date                          36275 non-null  int64  
 12  market_segment_type                   36275 non-null  object 
 13  repeated_guest                        36275 non-null  int64  
 14  no_of_previous_cancellations          36275 non-null  int64  
 15  no_of_previous_bookings_not_canceled  36275 non-null  int64  
 16  avg_price_per_room                    36275 non-null  float64
 17  no_of_special_requests                36275 non-null  int64  
 18  booking_status                        36275 non-null  object 
dtypes: float64(1), int64(13), object(5)
memory usage: 5.3+ MB
In [23]:
# List of columns with 'int' datatype:
data.select_dtypes(include='int').columns.to_list()
Out[23]:
['no_of_adults',
 'no_of_children',
 'no_of_weekend_nights',
 'no_of_week_nights',
 'required_car_parking_space',
 'lead_time',
 'arrival_year',
 'arrival_month',
 'arrival_date',
 'repeated_guest',
 'no_of_previous_cancellations',
 'no_of_previous_bookings_not_canceled',
 'no_of_special_requests']

Observation:

  • The columns have int, float, and object datatypes.
  • 5 columns that are of object datatype are Booking_ID, type_of_mean_plan, room_type_reserved, market_segment_type and booking_status.
  • 13 columns are of int datatype.
  • There is only 1 column of float datatype i.e., avg_price_per_room.
In [24]:
# Checking percentage of missing values: 
(data.isnull().sum()/data.shape[0])*100
Out[24]:
Booking_ID                             0.00000
no_of_adults                           0.00000
no_of_children                         0.00000
no_of_weekend_nights                   0.00000
no_of_week_nights                      0.00000
type_of_meal_plan                      0.00000
required_car_parking_space             0.00000
room_type_reserved                     0.00000
lead_time                              0.00000
arrival_year                           0.00000
arrival_month                          0.00000
arrival_date                           0.00000
market_segment_type                    0.00000
repeated_guest                         0.00000
no_of_previous_cancellations           0.00000
no_of_previous_bookings_not_canceled   0.00000
avg_price_per_room                     0.00000
no_of_special_requests                 0.00000
booking_status                         0.00000
dtype: float64
In [25]:
# Checking for duplicate rows: 
data.duplicated().sum()
Out[25]:
0

Observations:

  • There are no missing values of data in any of the columns.
  • There are no duplicate bookings in the dataset.
  • There are no duplicate records in the dataset.
In [26]:
# Statistical summary of the dataset
data.describe().T
Out[26]:
count mean std min 25% 50% 75% max
no_of_adults 36275.00000 1.84496 0.51871 0.00000 2.00000 2.00000 2.00000 4.00000
no_of_children 36275.00000 0.10528 0.40265 0.00000 0.00000 0.00000 0.00000 10.00000
no_of_weekend_nights 36275.00000 0.81072 0.87064 0.00000 0.00000 1.00000 2.00000 7.00000
no_of_week_nights 36275.00000 2.20430 1.41090 0.00000 1.00000 2.00000 3.00000 17.00000
required_car_parking_space 36275.00000 0.03099 0.17328 0.00000 0.00000 0.00000 0.00000 1.00000
lead_time 36275.00000 85.23256 85.93082 0.00000 17.00000 57.00000 126.00000 443.00000
arrival_year 36275.00000 2017.82043 0.38384 2017.00000 2018.00000 2018.00000 2018.00000 2018.00000
arrival_month 36275.00000 7.42365 3.06989 1.00000 5.00000 8.00000 10.00000 12.00000
arrival_date 36275.00000 15.59700 8.74045 1.00000 8.00000 16.00000 23.00000 31.00000
repeated_guest 36275.00000 0.02564 0.15805 0.00000 0.00000 0.00000 0.00000 1.00000
no_of_previous_cancellations 36275.00000 0.02335 0.36833 0.00000 0.00000 0.00000 0.00000 13.00000
no_of_previous_bookings_not_canceled 36275.00000 0.15341 1.75417 0.00000 0.00000 0.00000 0.00000 58.00000
avg_price_per_room 36275.00000 103.42354 35.08942 0.00000 80.30000 99.45000 120.00000 540.00000
no_of_special_requests 36275.00000 0.61966 0.78624 0.00000 0.00000 0.00000 1.00000 5.00000
In [27]:
# Statistical summary of the dataset for 'object' columns:
data.describe(include='object')
Out[27]:
Booking_ID type_of_meal_plan room_type_reserved market_segment_type booking_status
count 36275 36275 36275 36275 36275
unique 36275 4 7 5 2
top INN00001 Meal Plan 1 Room_Type 1 Online Not_Canceled
freq 1 27835 28130 23214 24390

Observations:

  • There are 36275 unique reservations in the dataset.
  • In most of the reservations, there are 2 adult guests, while the highest number of adults is 4.
  • The number of child guests in a reservation ranges from 0 to 10.
  • The minimum and maximum number of weekend nights in a reservation are 0 and 7 respectively.
  • The minimum and maximum number of week nights in a reservation are 0 and 17 respectively.
  • There are 4 kinds of meal plans chosen by the customer while making a reservation, the most popular of which is Meal Plan 1(i.e, breakfast).
  • The reservations have either 0 or 1 car parking space that the customer requires associated with them.
  • There are 7 types of room types in the dataset which the customer makes a reservation for. Room_Type 1 is the most popular choice out of them.
  • The minimum number of days between the a reservation being made and the arrival date is 0 while the maximum is 443 days. On average, a reservation is made 85.23 days before the arrival date.
  • The dataset contains reservation details for the years 2017 and 2018.
  • All 12 months are present in the dataset.
  • The hotel is catering to 5 kinds of markets. Online Market is the most popular among them.
  • Most months seem to have similar percentage of reservations which dips the most tn 31st of the month.
  • Most customers do not seem to have previously canceled reservations.
  • Most customers do not seem to have previous reservations.
  • The average price of a room per day ranges from 0 euros to 540 euros with an average of 103.42 euros.
  • Maximum number special requests made by the guests is 5.
  • Majority of the reservations are not cancelled.
In [28]:
# Since 'Booking_ID' is a primary key column, so we drop the column for further analysis: 
df = data.drop('Booking_ID', axis=1)
df.head()
Out[28]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 Not_Canceled
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 Not_Canceled
2 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00000 0 Canceled
3 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00000 0 Canceled
4 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 Canceled
In [29]:
# Since 'booking_status' is a flag column indicating whether the booking was canceled or not
# we change the 'Not_Canceled' values to 0 and 'Canceled' values to 1 for further analysis 
booking_flag = {'Not_Canceled':0, 'Canceled':1}
df['booking_status'] = df['booking_status'].replace(booking_flag)
In [30]:
df.head()
Out[30]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 0
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 0
2 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00000 0 1
3 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00000 0 1
4 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 1

Exploratory Data Analysis (EDA)

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned above which will help you approach the analysis in the right manner and generate insights from the data.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.

Univariate Analysis

lead_time

In [31]:
# Histplot and Boxplot to show the distribution of data for the column 'lead_time':
creating_hist_box(df, 'lead_time', bins=40)
In [32]:
df['lead_time'].describe()
Out[32]:
count   36275.00000
mean       85.23256
std        85.93082
min         0.00000
25%        17.00000
50%        57.00000
75%       126.00000
max       443.00000
Name: lead_time, dtype: float64

Observations:

  • The distribution for the number of days between the booking date and the arrival date is highly right-skewed.
  • On average a room reservation is made 85.23 days in advance with a standard deviation of around 85.93 days.
  • The number of days a room has been booked before the arrival date ranges from 0 (ie, the same day as the arrival) to 443 days (ie, 1.2 years).
  • The median value for 'lead_time' is 57 days which is less than the average.
  • There are outlier values on the higher end of the distribution for 'lead_time'.

avg_price_per_room

In [33]:
# Histplot and Boxplot to show the distribution of data for the column 'avg_price_per_room':
creating_hist_box(df, 'avg_price_per_room', bins=40)
In [34]:
df['avg_price_per_room'].describe()
Out[34]:
count   36275.00000
mean      103.42354
std        35.08942
min         0.00000
25%        80.30000
50%        99.45000
75%       120.00000
max       540.00000
Name: avg_price_per_room, dtype: float64

Observations:

  • The distribution for the average price of the reservation per day is slightly right-skewed.
  • The average price of reservation per day is around 103.42 euros, while the median price is 99.45 euros.
  • The cheapest value for an average price of the reservation per day is 0 euros, while the most expensive is 540 euros.
  • The standard deviation for average price of reservation per day is around 35.09 euros.
  • There are outlier values on both the lower and higher ends of the distribution for the average price per day for reservations.
In [35]:
df.loc[df['avg_price_per_room']==0].shape
Out[35]:
(545, 18)
In [36]:
df.loc[df["avg_price_per_room"] == 0, ["market_segment_type"]].value_counts()
Out[36]:
market_segment_type
Complementary          354
Online                 191
dtype: int64

Observations:

  • 354 reservations are complementary and have been charged 0 euros.
  • 191 online reservations have no charge associated with them.

no_of_previous_cancellations

In [37]:
# Histplot and Boxplot to show the distribution of data for the column 'no_of_previous_cancellations':
creating_hist_box(df, 'no_of_previous_cancellations', bins=40)
In [38]:
df['no_of_previous_cancellations'].describe()
Out[38]:
count   36275.00000
mean        0.02335
std         0.36833
min         0.00000
25%         0.00000
50%         0.00000
75%         0.00000
max        13.00000
Name: no_of_previous_cancellations, dtype: float64
In [39]:
df.loc[df['no_of_previous_cancellations']==0].shape[0]/df.shape[0] * 100
Out[39]:
99.06822880771881
In [40]:
df.loc[df['no_of_previous_cancellations']!=0].shape[0]/df.shape[0] * 100
Out[40]:
0.9317711922811854

Observations:

  • The distribution of the number of previous bookings that were canceled by the customer before the current booking is highly right-skewed.
  • Majority of the current reservations (ie. almost 99% ) have no prior cancellations. Only 1% (approx.) of current reservations have prior reservations that were canceled.
  • The maximum number of cancellations a reservation has before the current booking is 13.

no_of_previous_bookings_not_canceled

In [41]:
# Histplot and Boxplot to show the distribution of data for the column 'no_of_previous_bookings_not_cancelled':
creating_hist_box(df, 'no_of_previous_bookings_not_canceled', bins=40)
In [42]:
df['no_of_previous_bookings_not_canceled'].describe()
Out[42]:
count   36275.00000
mean        0.15341
std         1.75417
min         0.00000
25%         0.00000
50%         0.00000
75%         0.00000
max        58.00000
Name: no_of_previous_bookings_not_canceled, dtype: float64

Obsetvations:

  • The distribution of the number of previous bookings that not canceled by the customer prior to the current booking is highly right-skewed.
  • Most customers do not seem to have previous reservations.

no_of_adults

In [43]:
# Barplot for the cloumn 'no_of_adults' in the dataset:
labeled_barplot(df,'no_of_adults', perc=True)
In [44]:
df['no_of_adults'].value_counts()/df.shape[0] * 100
Out[44]:
2   71.97243
1   21.21296
3    6.38732
0    0.38318
4    0.04411
Name: no_of_adults, dtype: float64

Observations:

  • Most reservations (ie almost 72%) are made for 2 adult guests, followed by 21% of reservations that have 1 adult guest.

no_of_children

In [45]:
# Barplot for the cloumn 'no_of_children' in the dataset:
labeled_barplot(df, 'no_of_children', perc=True)
In [46]:
df['no_of_children'].value_counts()/df.shape[0] * 100
Out[46]:
0    92.56237
1     4.46037
2     2.91661
3     0.05238
9     0.00551
10    0.00276
Name: no_of_children, dtype: float64

Observations:

  • Most reservations (ie almost 93%) have no children, while 4% of the reservations have 1 child.
  • Less than 0.01% of the reservations have many children visiting (ie, 9 or 10 children).

no_of_week_nights

In [47]:
# Barplot for the cloumn 'no_of_week_nights' in the dataset:
labeled_barplot(df, 'no_of_week_nights', perc=True)
In [48]:
(df['no_of_week_nights'].value_counts()/df.shape[0] * 100).head(5)
Out[48]:
2   31.54790
1   26.15575
3   21.60992
4    8.24259
0    6.58029
Name: no_of_week_nights, dtype: float64
In [49]:
(df['no_of_week_nights'].value_counts()/df.shape[0] * 100).tail()
Out[49]:
12   0.02481
14   0.01930
13   0.01378
17   0.00827
16   0.00551
Name: no_of_week_nights, dtype: float64

Observations:

  • Almost 88% of the reservations have 1 to 4 weeknights.
  • Majority of the guests spend 2 weeknights followed by 1 and 3 nights as shown below:
No. of Week Nights Percentage of reservation (%)
2 31.55
1 26.16
3 21.61
4 8.24

no_of_weekend_nights

In [50]:
# Barplot for the cloumn 'no_of_weekend_nights' in the dataset:
labeled_barplot(df, 'no_of_weekend_nights', perc=True)
In [51]:
(df['no_of_weekend_nights'].value_counts()/df.shape[0] * 100)
Out[51]:
0   46.51137
1   27.55341
2   25.00620
3    0.42178
4    0.35562
5    0.09373
6    0.05513
7    0.00276
Name: no_of_weekend_nights, dtype: float64

Observations:

  • Around 99% of the reservations have 0 to 2 weekend nights
  • Below is the percentage distribution for the highest number of weekend nights guests have made reservations for:
No. of Weekend Nights Percentage of reservation (%)
0 46.51
1 27.55
2 25.01

required_car_parking_space

In [52]:
# Barplot for the cloumn 'required_car_parking_space' in the dataset:
labeled_barplot(df, 'required_car_parking_space', perc=True)
In [53]:
df['required_car_parking_space'].value_counts()
Out[53]:
0    35151
1     1124
Name: required_car_parking_space, dtype: int64
In [54]:
(df['required_car_parking_space'].value_counts()/df.shape[0] *100)
Out[54]:
0   96.90145
1    3.09855
Name: required_car_parking_space, dtype: float64

Observations:

  • The column 'required_car_parking_space' tells us whether the reservation includes a required car parking spot(1) or not(0).
  • Around 3.1% of the reservations require 1 car parking spot while the rest do not require a parking spot.

type_of_meal_plan

In [55]:
# Barplot for the cloumn 'type_of_meal_plan' in the dataset::
labeled_barplot(df,'type_of_meal_plan', perc=True, rotatn=90)
In [56]:
round(df['type_of_meal_plan'].value_counts()/df.shape[0] *100, 2)
Out[56]:
Meal Plan 1    76.73000
Not Selected   14.14000
Meal Plan 2     9.11000
Meal Plan 3     0.01000
Name: type_of_meal_plan, dtype: float64

Observations:

  • Around 14% of reservations had no meal plan selected.
  • In the majority of the reservations (almost 76.7%), guests selected 'Meal Plan 1' which included breakfast along with their stay.
  • Meal plans chosen by the guests are shown below:
Meal Plan Includes Percentage of reservation (%)
Not Selected N/A 14.14
Meal Plan 1 Beakfast 76.73
Meal Plan 2 Breakfast + Lunch 9.11
Meal Plan 3 Breakfast + Lunch + Dinner 0.01

room_type_reserved

In [57]:
# Barplot for the cloumn 'room_type_reserved' in the dataset:
labeled_barplot(df, 'room_type_reserved', perc=True, rotatn=90)
In [58]:
round(df['room_type_reserved'].value_counts()/df.shape[0] *100, 2)
Out[58]:
Room_Type 1   77.55000
Room_Type 4   16.70000
Room_Type 6    2.66000
Room_Type 2    1.91000
Room_Type 5    0.73000
Room_Type 7    0.44000
Room_Type 3    0.02000
Name: room_type_reserved, dtype: float64

Observations:

  • Room Type 1 is the most popular. Almost 7.55% of the reservations are of this type.
  • The other popular rooms are Room Type 4, Room Type 6 and Room Type 2.
  • Room Type 3 is the least popular.

arrival_month

In [59]:
# Barplot for the cloumn 'arrival_month' in the dataset:
labeled_barplot(df, 'arrival_month', perc=True)
In [60]:
df['arrival_month'].value_counts()
Out[60]:
10    5317
9     4611
8     3813
6     3203
12    3021
11    2980
7     2920
4     2736
5     2598
3     2358
2     1704
1     1014
Name: arrival_month, dtype: int64
In [61]:
(df['arrival_month'].value_counts()/df.shape[0])*100
Out[61]:
10   14.65748
9    12.71123
8    10.51137
6     8.82977
12    8.32805
11    8.21502
7     8.04962
4     7.54238
5     7.16196
3     6.50034
2     4.69745
1     2.79531
Name: arrival_month, dtype: float64

Observations:

  • The most popular months are October, September, and August, with around 37.88% of the reservations.
  • The least popular months are January, February, and March, with around 13.99% of the reservations.

arrival_year

In [62]:
# Barplot for the cloumn 'arrival_year' in the dataset:
labeled_barplot(df, 'arrival_year', perc=True)

Observations:

  • The dataset contains reservation details for the years 2017 and 2018.
  • There is a substantial increase in the percentage of reservations made from 2017 to 2018.

arrival_date

In [63]:
# Barplot for the cloumn 'arrival_date' in the dataset:
labeled_barplot(df, 'arrival_date', perc=True)

Observations:

  • Most days have an almost similar percentage of reservations (ranging from a high of 3.7% to a low of 2.7%)
  • The 31st day of the different months seems to have the least percentage of reservations.

market_segment_type

In [64]:
# Barplot for the cloumn 'market_segment_type' in the dataset:
labeled_barplot(df, 'market_segment_type', perc=True, rotatn=90)
In [65]:
(df['market_segment_type'].value_counts()/df.shape[0]) *100
Out[65]:
Online          63.99449
Offline         29.02274
Corporate        5.56030
Complementary    1.07788
Aviation         0.34459
Name: market_segment_type, dtype: float64

Observations:

  • Almost 64% of the reservations are attributed to the online market.
  • The lowest percentage of reservations are from the aviation market (ie., 0.3%).
  • Offline markets attribute to 29% of the reservations, while around 5.56% of reservations are made for corporate events.

no_of_special_requests

In [66]:
# Barplot for the cloumn 'no_of_special_requests' in the dataset:
labeled_barplot(df, 'no_of_special_requests', perc=True)
In [67]:
(df['no_of_special_requests'].value_counts()/df.shape[0])*100
Out[67]:
0   54.51964
1   31.35217
2   12.03032
3    1.86079
4    0.21502
5    0.02205
Name: no_of_special_requests, dtype: float64

Observations:

  • Majority of the reservations (ie, 54.51%) do not have any special requests.
  • 31.35% of the reservations have one special request, while the percentage of reservations with 5 special requests is only 0.02%

booking_status

In [68]:
# Barplot for the cloumn 'booking_status' in the dataset:
labeled_barplot(df, 'booking_status', perc=True)

Observations:

  • Majority of the reservations (almost 67.2%) do not get cancelled.
  • 32.8% reseservations are cancelled.

Multivariate Analysis

Correlation between the variables

In [69]:
cols_list = df.select_dtypes(include=np.number).columns.tolist()

df_corr = df[cols_list].corr()
df_corr
Out[69]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month arrival_date repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
no_of_adults 1.00000 -0.01979 0.10332 0.10562 0.01143 0.09729 0.07672 0.02184 0.02634 -0.19228 -0.04743 -0.11917 0.29689 0.18940 0.08692
no_of_children -0.01979 1.00000 0.02948 0.02440 0.03424 -0.04709 0.04598 -0.00308 0.02548 -0.03635 -0.01639 -0.02119 0.33773 0.12449 0.03308
no_of_weekend_nights 0.10332 0.02948 1.00000 0.17958 -0.03111 0.04660 0.05536 -0.00989 0.02730 -0.06711 -0.02069 -0.02631 -0.00452 0.06059 0.06156
no_of_week_nights 0.10562 0.02440 0.17958 1.00000 -0.04878 0.14965 0.03267 0.03738 -0.00930 -0.09976 -0.03008 -0.04934 0.02275 0.04599 0.09300
required_car_parking_space 0.01143 0.03424 -0.03111 -0.04878 1.00000 -0.06644 0.01568 -0.01550 -0.00004 0.11091 0.02711 0.06381 0.06130 0.08792 -0.08619
lead_time 0.09729 -0.04709 0.04660 0.14965 -0.06644 1.00000 0.14344 0.13681 0.00648 -0.13598 -0.04572 -0.07814 -0.06260 -0.10164 0.43854
arrival_year 0.07672 0.04598 0.05536 0.03267 0.01568 0.14344 1.00000 -0.33969 0.01885 -0.01818 0.00392 0.02642 0.17860 0.05321 0.17953
arrival_month 0.02184 -0.00308 -0.00989 0.03738 -0.01550 0.13681 -0.33969 1.00000 -0.04278 0.00034 -0.03861 -0.01072 0.05442 0.11055 -0.01123
arrival_date 0.02634 0.02548 0.02730 -0.00930 -0.00004 0.00648 0.01885 -0.04278 1.00000 -0.01595 -0.01254 -0.00150 0.01790 0.01835 0.01063
repeated_guest -0.19228 -0.03635 -0.06711 -0.09976 0.11091 -0.13598 -0.01818 0.00034 -0.01595 1.00000 0.39081 0.53916 -0.17490 -0.01182 -0.10729
no_of_previous_cancellations -0.04743 -0.01639 -0.02069 -0.03008 0.02711 -0.04572 0.00392 -0.03861 -0.01254 0.39081 1.00000 0.46815 -0.06334 -0.00332 -0.03373
no_of_previous_bookings_not_canceled -0.11917 -0.02119 -0.02631 -0.04934 0.06381 -0.07814 0.02642 -0.01072 -0.00150 0.53916 0.46815 1.00000 -0.11368 0.02738 -0.06018
avg_price_per_room 0.29689 0.33773 -0.00452 0.02275 0.06130 -0.06260 0.17860 0.05442 0.01790 -0.17490 -0.06334 -0.11368 1.00000 0.18438 0.14257
no_of_special_requests 0.18940 0.12449 0.06059 0.04599 0.08792 -0.10164 0.05321 0.11055 0.01835 -0.01182 -0.00332 0.02738 0.18438 1.00000 -0.25307
booking_status 0.08692 0.03308 0.06156 0.09300 -0.08619 0.43854 0.17953 -0.01123 0.01063 -0.10729 -0.03373 -0.06018 0.14257 -0.25307 1.00000
In [70]:
#Heatmap showing correlation values between different variables.
plt.figure(figsize=(12, 7))
sns.heatmap(df_corr, annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()

Observations:

  • No significant values of high correlation among variables observed.
  • The highly correlated variables (comparatively) are:
Variable 1 Variable 2 Correlation Value
no_of_previous_cancellations repeated_guest 0.39
no_of_previous_bookings_not_cancelled repeated_guest 0.54
no_of_previous_bookings_not_cancelled no_of_previous_cancellations 0.47
booking_status lead_time 0.44
avg_price_per_room no_of_children 0.34
avg_price_per_room no_of_adults 0.30
  • Average price per room is positively correlated with no. of children, no. of adults,no_of_week_nighs,required_car_parking_space, arrival_year, arrival_month, arrival_date, no. of special requests, booking status and negatively correlated with no_of_weekend_nights, repreated_guest, no_of_previous_cancellations and no_of_previous_bookings_not_cancelled.

  • The variable booking_status is positively correlated with no_of_adults, no_of_children, no_of_week_nights, no_of_weekend_nights, lead_time, arrival_date , avg_price_per_room and arrival_year and negatively correlated with arrival_month,required_car_parking_space, no_of_special_requests, repeated_guest, no_of_previous_bookings_not_cancelled, and no_of_previous_cancellations.

  • Arrival month and arrival year have the highest negative correlation (of value -0.34).

market_segment_type vs avg_price_per_room

In [71]:
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x="market_segment_type", y="avg_price_per_room", palette="gist_rainbow")
plt.show()
In [72]:
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x="market_segment_type", y="avg_price_per_room")
plt.grid()
plt.show()
In [73]:
df.groupby(['market_segment_type'])['avg_price_per_room'].mean()
Out[73]:
market_segment_type
Aviation        100.70400
Complementary     3.14176
Corporate        82.91174
Offline          91.63268
Online          112.25685
Name: avg_price_per_room, dtype: float64

Observations:

  • Online reservations have the highest average price per room per day, while the reservations are complementary have the lowest values.
  • Reservations made for the aviation market have the next highest average price per room per day followed by Offline and corporate markets.

market_segment_type vs booking status

In [74]:
stacked_barplot(df, "market_segment_type", "booking_status")
booking_status           0      1    All
market_segment_type                     
All                  24390  11885  36275
Online               14739   8475  23214
Offline               7375   3153  10528
Corporate             1797    220   2017
Aviation                88     37    125
Complementary          391      0    391
------------------------------------------------------------------------------------------------------------------------

Observations:

  • The online market has the highest number of reservations that get cancelled.
  • Reservations that are complementary do not get canceled.
  • Reservations booked for corporate events have the lowest percentage of cancelations.
  • The offline and aviation markets have similar cancellation percentages.

no_of_special_requests vs avg_price_per_room

In [75]:
plt.figure(figsize=(10, 5))
sns.boxplot(data = df, x='no_of_special_requests', y='avg_price_per_room')  
plt.show()
In [76]:
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x="no_of_special_requests", y="avg_price_per_room", ci= False)
plt.grid()
plt.show()
In [77]:
df.groupby(['no_of_special_requests'])['avg_price_per_room'].mean()
Out[77]:
no_of_special_requests
0    98.44070
1   105.53395
2   118.05876
3   118.29699
4   110.07103
5   118.12500
Name: avg_price_per_room, dtype: float64

Observations:

  • There is an increase in the average price per room per day with the increase in the number of special requests.
  • However, the line plot shows that the average price per room per day with 4 special requests is less than that with 3.

no_of_special_requests vs booking_status

In [78]:
stacked_barplot(df, "no_of_special_requests", "booking_status")
booking_status              0      1    All
no_of_special_requests                     
All                     24390  11885  36275
0                       11232   8545  19777
1                        8670   2703  11373
2                        3727    637   4364
3                         675      0    675
4                          78      0     78
5                           8      0      8
------------------------------------------------------------------------------------------------------------------------

Observations:

  • Reservations with a higher number of special requests (3,4 and 5) are not canceled.
  • Percentage of cancellations is the highest for reservations without any special requests.
  • Percentage of cancellations decreases with an increase in several special requests.

booking_status vs avg_price_per_room

In [79]:
distribution_plot_wrt_target(df, "avg_price_per_room", "booking_status")
In [80]:
df.loc[df['booking_status'] ==0,'avg_price_per_room'].describe()
Out[80]:
count   24390.00000
mean       99.93141
std        35.87215
min         0.00000
25%        77.86000
50%        95.00000
75%       119.10000
max       375.50000
Name: avg_price_per_room, dtype: float64
In [81]:
df.loc[df['booking_status'] ==1,'avg_price_per_room'].describe()
Out[81]:
count   11885.00000
mean      110.58997
std        32.26439
min         0.00000
25%        89.27000
50%       108.00000
75%       126.36000
max       540.00000
Name: avg_price_per_room, dtype: float64

Observations:

  • Reservations that are canceled have mean higher values for average price per room per day as compared to those that do not get concealed.
  • The median, minimum, and maximum values are also higher for 'avg_price_per_room' for reservations that get canceled as compared to those that do not.

booking_status vs lead_time

In [82]:
distribution_plot_wrt_target(df, 'lead_time', 'booking_status')
In [83]:
df.loc[df['booking_status'] ==0,'lead_time'].describe()
Out[83]:
count   24390.00000
mean       58.92722
std        64.02871
min         0.00000
25%        10.00000
50%        39.00000
75%        86.00000
max       386.00000
Name: lead_time, dtype: float64
In [84]:
df.loc[df['booking_status'] ==1,'lead_time'].describe()
Out[84]:
count   11885.00000
mean      139.21548
std        98.94773
min         0.00000
25%        55.00000
50%       122.00000
75%       205.00000
max       443.00000
Name: lead_time, dtype: float64

Observations:

  • Reservations that are canceled have mean higher values for number of days between the date the reservation has been booked and the arrival date as compared to those that do not get concealed.
  • The median, minimum, and maximum values are also higher for 'lead_time' for reservations that get canceled as compared to those that do not.

Family data vs booking_status

In [85]:
family_df = df[(df["no_of_children"] >= 0) & (df["no_of_adults"] > 1)]
family_df.shape
Out[85]:
(28441, 18)
In [86]:
family_df["no_of_family_members"] = family_df["no_of_adults"] + family_df["no_of_children"]
family_df.head()
Out[86]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status no_of_family_members
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 0 2
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 0 2
3 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00000 0 1 2
4 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 1 2
5 2 0 0 2 Meal Plan 2 0 Room_Type 1 346 2018 9 13 Online 0 0 0 115.00000 1 1 2
In [87]:
stacked_barplot(family_df, 'no_of_family_members', 'booking_status')
booking_status            0     1    All
no_of_family_members                    
All                   18456  9985  28441
2                     15506  8213  23719
3                      2425  1368   3793
4                       514   398    912
5                        10     5     15
11                        0     1      1
12                        1     0      1
------------------------------------------------------------------------------------------------------------------------

Observations:

  • There are 28441 family reservations made in the dataset.
  • Reservations made for family sizes 3,2,5 have similar percentages of cancelled reservations, while that with 4 family members have a slightly larger cancellation percentage.
  • For large family size, reservation with 11 family members has been cancelled while that with 12 has not been cancelled.

stay_data vs booking_status

In [88]:
stay_df = df[(df["no_of_week_nights"] > 0) & (df["no_of_weekend_nights"] > 0)]
stay_df.shape
Out[88]:
(17094, 18)
In [89]:
stay_df["total_days"] = stay_df["no_of_week_nights"] + stay_df["no_of_weekend_nights"]
stay_df.head()
Out[89]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status total_days
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00000 0 0 3
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68000 1 0 5
2 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00000 0 1 3
4 2 0 1 1 Not Selected 0 Room_Type 1 48 2018 4 11 Online 0 0 0 94.50000 0 1 2
6 2 0 1 3 Meal Plan 1 0 Room_Type 1 34 2017 10 15 Online 0 0 0 107.55000 1 0 4
In [90]:
stacked_barplot(stay_df, 'total_days', 'booking_status')
booking_status      0     1    All
total_days                        
All             10979  6115  17094
3                3689  2183   5872
4                2977  1387   4364
5                1593   738   2331
2                1301   639   1940
6                 566   465   1031
7                 590   383    973
8                 100    79    179
10                 51    58    109
9                  58    53    111
14                  5    27     32
15                  5    26     31
13                  3    15     18
12                  9    15     24
11                 24    15     39
20                  3     8     11
19                  1     5      6
16                  1     5      6
17                  1     4      5
18                  0     3      3
21                  1     3      4
22                  0     2      2
23                  1     1      2
24                  0     1      1
------------------------------------------------------------------------------------------------------------------------

Observations:

  • Percentage of cancellations seem to increase with increase in total duration of stay.

repeated_guest vs booking_status

In [91]:
stacked_barplot(df, 'repeated_guest', 'booking_status')
booking_status      0      1    All
repeated_guest                     
All             24390  11885  36275
0               23476  11869  35345
1                 914     16    930
------------------------------------------------------------------------------------------------------------------------

Observations:

  • Percentage of cancellations is very less in case the customer is a repeated guest.
  • Percentage of cancellation is higher in case the customer is not a repeated guest.

arrival_month vs booking_status

In [92]:
monthly_data = df.groupby(["arrival_month"])["booking_status"].count()
monthly_data
Out[92]:
arrival_month
1     1014
2     1704
3     2358
4     2736
5     2598
6     3203
7     2920
8     3813
9     4611
10    5317
11    2980
12    3021
Name: booking_status, dtype: int64
In [93]:
monthly_data = pd.DataFrame({"Month": list(monthly_data.index), "Guests": list(monthly_data.values)})
In [94]:
plt.figure(figsize=(10, 5))
sns.lineplot(data=monthly_data, x="Month", y="Guests")
plt.grid()
plt.show()
In [95]:
stacked_barplot (df, 'arrival_month', 'booking_status')
booking_status      0      1    All
arrival_month                      
All             24390  11885  36275
10               3437   1880   5317
9                3073   1538   4611
8                2325   1488   3813
7                1606   1314   2920
6                1912   1291   3203
4                1741    995   2736
5                1650    948   2598
11               2105    875   2980
3                1658    700   2358
2                1274    430   1704
12               2619    402   3021
1                 990     24   1014
------------------------------------------------------------------------------------------------------------------------

Observations:

  • Number of reservations increase from January to October with a slight dip in May and July, and then falls in November and December.
  • The busiest months in order are October, September, and August.
  • The least busy months are January, February, and March.
  • January has the lowest percentage of cancellations, while July has the highest.

arrival_month vs avg_price_per_room

In [96]:
# Box plot for showing distribution of room prices per month: 
plt.figure(figsize = (8,6))
sns.boxplot(data= df, x='arrival_month', y='avg_price_per_room')
plt.show()

Observations:

  • Average price per room per day seems to be higher for the months May, July, August and September.
  • For the month of March there seems to an outlier way above the upper whisker for the distribution of prices for the month which implies rooms have been reserved at a very high price of greater than 500 euros per day.
  • There are high number of outliers in the upper end of the distrbution of prices for every month.

room_type vs repeated_guests

In [97]:
stacked_barplot (df, 'room_type_reserved', 'repeated_guest')
repeated_guest          0    1    All
room_type_reserved                   
All                 35345  930  36275
Room_Type 1         27322  808  28130
Room_Type 4          5990   67   6057
Room_Type 7           137   21    158
Room_Type 5           248   17    265
Room_Type 6           956   10    966
Room_Type 2           685    7    692
Room_Type 3             7    0      7
------------------------------------------------------------------------------------------------------------------------

Observations:

  • Room_Type 7 is most preferred by repeated guests, followed by Room_type 5 and Room_Type 1.
  • Room_Type 3 is least preferred by repeated guests.

no_of_previous_bookings_not_canceled vs repeated_guests

In [98]:
# Box plot for showing distribution of no_of_previous_bookings_not_canceled vs repeated_guests: 
plt.figure(figsize = (8,6))
sns.boxplot(data= df, x='repeated_guest', y='no_of_previous_bookings_not_canceled')
plt.show()

Observations:

  • Repeated guests do not seem to have many reservations that they cancel.

Questions

Q1. What are the busiest months in the hotel?
In [99]:
labeled_barplot(df, 'arrival_month', perc=True)
In [100]:
df['arrival_month'].value_counts()
Out[100]:
10    5317
9     4611
8     3813
6     3203
12    3021
11    2980
7     2920
4     2736
5     2598
3     2358
2     1704
1     1014
Name: arrival_month, dtype: int64

Observations:

  • The top 5 busiest months in order are as follows:
Month No. of reservations Percentage of reservations (%)
October 5317 14.7
September 4611 12.7
August 3813 10.5
June 3203 8.8
December 3021 8.3
  • The least busiest months in order are as follows:
Month No. of reservations Percentage of reservations (%)
January 1014 2.8
February 1704 4.7
March 2358 6.5
May 2598 7.2
April 2736 7.2
Q2. Which market segment do most of the guests come from?
In [101]:
labeled_barplot(df, 'market_segment_type', perc=True, rotatn=90)
In [102]:
market_seg_df = df.groupby(['market_segment_type']).agg(no=('market_segment_type','count')).reset_index()
market_seg_df['percent'] = market_seg_df['no']/df.shape[0] *100
market_seg_df
Out[102]:
market_segment_type no percent
0 Aviation 125 0.34459
1 Complementary 391 1.07788
2 Corporate 2017 5.56030
3 Offline 10528 29.02274
4 Online 23214 63.99449

Observations:

  • Almost 64% of the guests belong to the 'Online' market type, followed by 29% who belong to the 'Offline' market type.
  • 5.6% of reservations are attributed to Corporate market type.
  • Aviation and Complementary markets have the least percentage of reservations.
Q3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
In [103]:
sns.lineplot(data=df, x='market_segment_type', y='avg_price_per_room')
plt.grid()
plt.show()
In [104]:
market_seg_price_df = df.groupby(['market_segment_type']).agg(average_price=('avg_price_per_room','mean')).reset_index()
market_seg_price_df.sort_values(by='average_price', ascending= False)
Out[104]:
market_segment_type average_price
4 Online 112.25685
0 Aviation 100.70400
3 Offline 91.63268
2 Corporate 82.91174
1 Complementary 3.14176

Observations:

  • The different average price points of rooms per day with respect to market segments is as follows:
Month Average Price per room per day(euros)
Online 112.26
Aviation 100.70
Offline 91.63
Corporate 82.91
Complementary 3.14
Q4. What percentage of bookings are canceled?
In [105]:
labeled_barplot(df, 'booking_status', perc=True)
In [106]:
df.groupby(['booking_status'])['booking_status'].count()
Out[106]:
booking_status
0    24390
1    11885
Name: booking_status, dtype: int64
In [107]:
df.groupby(['booking_status'])['booking_status'].count()/df.shape[0] *100
Out[107]:
booking_status
0   67.23639
1   32.76361
Name: booking_status, dtype: float64

Observations:

  • 67.24% of the reservations are canceled.
  • 32.76% of the reservations are not canceled.
Q5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
In [108]:
plt.figure(figsize=(4,5))
sns.catplot(data=df, x='repeated_guest', hue='booking_status', kind='count')
plt.show()
<Figure size 400x500 with 0 Axes>
In [109]:
guests_df = df.groupby(['repeated_guest','booking_status']).agg(number=('repeated_guest','count')).reset_index()
guests_df
Out[109]:
repeated_guest booking_status number
0 0 0 23476
1 0 1 11869
2 1 0 914
3 1 1 16
In [110]:
repeated_guests_df = guests_df.query('repeated_guest==1')
repeated_guests_df['percent'] = repeated_guests_df['number']/repeated_guests_df['number'].sum() *100
repeated_guests_df
Out[110]:
repeated_guest booking_status number percent
2 1 0 914 98.27957
3 1 1 16 1.72043

Observations:

  • Of the 930 guests who have made repeatedly made reservations:
    • 16 guests have canceled their reservations, which corresponds to around 1.72%.
    • 914 guests have not canceled their reservations, which corresponds to around 98.28%.
Q6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
In [111]:
plt.figure(figsize=(4,5))
sns.catplot(data=df, x='no_of_special_requests', hue='booking_status', kind='count')
plt.show()
<Figure size 400x500 with 0 Axes>
In [112]:
stacked_barplot(df, "no_of_special_requests", "booking_status")
booking_status              0      1    All
no_of_special_requests                     
All                     24390  11885  36275
0                       11232   8545  19777
1                        8670   2703  11373
2                        3727    637   4364
3                         675      0    675
4                          78      0     78
5                           8      0      8
------------------------------------------------------------------------------------------------------------------------
In [113]:
sp_request_df = df.groupby(['no_of_special_requests','booking_status']).agg(number=('no_of_special_requests','count')).reset_index()
sp_request_df
Out[113]:
no_of_special_requests booking_status number
0 0 0 11232
1 0 1 8545
2 1 0 8670
3 1 1 2703
4 2 0 3727
5 2 1 637
6 3 0 675
7 4 0 78
8 5 0 8
In [114]:
special_req_0 = sp_request_df.query('no_of_special_requests == 0')
special_req_0['percent'] = special_req_0['number']/special_req_0['number'].sum() *100
print(special_req_0)
percent_0_req = round(special_req_0.loc[1,'percent'],2)
   no_of_special_requests  booking_status  number  percent
0                       0               0   11232 56.79324
1                       0               1    8545 43.20676
In [115]:
special_req_1 = sp_request_df.query('no_of_special_requests == 1')
special_req_1['percent'] = special_req_1['number']/special_req_1['number'].sum() *100
print(special_req_1)
percent_1_req = round(special_req_1.loc[3,'percent'],2)
   no_of_special_requests  booking_status  number  percent
2                       1               0    8670 76.23318
3                       1               1    2703 23.76682
In [116]:
special_req_2 = sp_request_df.query('no_of_special_requests == 2')
special_req_2['percent'] = special_req_2['number']/special_req_2['number'].sum() *100
print(special_req_2)
percent_2_req = round(special_req_2.loc[5,'percent'],2)
   no_of_special_requests  booking_status  number  percent
4                       2               0    3727 85.40330
5                       2               1     637 14.59670
In [117]:
special_req_3 = sp_request_df.query('no_of_special_requests == 3')
special_req_3['percent'] = special_req_3['number']/special_req_3['number'].sum() *100
print(special_req_3)
   no_of_special_requests  booking_status  number   percent
6                       3               0     675 100.00000
In [118]:
special_req_4 = sp_request_df.query('no_of_special_requests == 4')
special_req_4['percent'] = special_req_4['number']/special_req_4['number'].sum() *100
print(special_req_4)
   no_of_special_requests  booking_status  number   percent
7                       4               0      78 100.00000
In [119]:
special_req_5 = sp_request_df.query('no_of_special_requests == 5')
special_req_5['percent'] = special_req_5['number']/special_req_5['number'].sum() *100
print(special_req_5)
   no_of_special_requests  booking_status  number   percent
8                       5               0       8 100.00000
In [120]:
print(f'Out of the reservations made which had no special requests, {percent_0_req}% of the reservations were cancelled.')
print(f'Out of the reservations made which had 1 special request, {percent_1_req}% of the reservations were cancelled.')
print(f'Out of the reservations made which had 2 special requests, {percent_2_req}% of the reservations were cancelled.')
Out of the reservations made which had no special requests, 43.21% of the reservations were cancelled.
Out of the reservations made which had 1 special request, 23.77% of the reservations were cancelled.
Out of the reservations made which had 2 special requests, 14.6% of the reservations were cancelled.

Observations:

  • Percentage of cancellations decreases with an increase in several special requests.
  • Reservations with a higher number of special requests (3,4 and 5) are not canceled.
  • Percentage of cancellations is the highest for reservations without any special requests.
  • Following table shows the percentage of cancellations concerning number of special requests:
No.of special request Percentage of canceled reservations (%)
0 43.21
1 23.77
2 14.6
3 0
4 0
5 0

Data Preprocessing

Missing value treatment

Since, there are no missing values, we do not have to carry out missing value treatment.

Outlier Detection:

In [121]:
# outlier detection using boxplot
numeric_columns = df.select_dtypes(include=np.number).columns.tolist()
# dropping booking_status
numeric_columns.remove("booking_status")

plt.figure(figsize=(15, 12))

for i, variable in enumerate(numeric_columns):
    plt.subplot(4, 4, i + 1)
    plt.boxplot(data[variable], whis=1.5)
    plt.tight_layout()
    plt.title(variable)

plt.show()

Observations:

  • There are quite a few outliers in the data. We will not treat them as they are proper values.
  • In case of 'avg_price_per_room', as there is only 1 value of 540 euros which is very high as compared to rest of the data points in the column, we assign it the value of upper whisker of the column distribution.
  • In case of 'no_of_children', as there is only 3 records that have large no. of children (9 and 10), we replace them with 3.
  • In case of number of days a reservation has been booked for, there 78 are records which satisfy the condition (no_of_weekend_nights==0 & no_of_week_nights==0) which implies that the duration of stay for such reservations is 0 days, which need to be looked into, as these records might have missing data or wrong data. So, further analysis has been done after dropping these 78 records.
In [122]:
# Checking no of records that have avg_price_per_room > 500:
df.loc[df["avg_price_per_room"] >500].shape[0]
Out[122]:
1
In [123]:
# Calculating upper whisker value for box plot for 'avg_price_per_room' column:
q1 = df["avg_price_per_room"].quantile(0.25)
q3 = df["avg_price_per_room"].quantile(0.75)  
IQR = q3 - q1

upper_Whisker = q3 + 1.5 * IQR
upper_Whisker
Out[123]:
179.55
In [124]:
# Replacing avg_price_per_room >500 with upper whisker value:
df.loc[df["avg_price_per_room"] >= 500, "avg_price_per_room"] = upper_Whisker
In [125]:
# Replacing records with no_of_children = 9 and 10 with 3:
df["no_of_children"] = df["no_of_children"].replace([9, 10], 3)
In [126]:
# Checking no of records that have 0 no_of_week_nights and 0 no_of_weekend_nights:
no_days_df = df.loc[(df['no_of_weekend_nights']==0) & (df['no_of_week_nights']==0)]
remove_index = no_days_df.index.to_list()
In [127]:
# Dropping records that have 0 no_of_week_nights and 0 no_of_weekend_nights:
df.drop(index=remove_index, inplace=True)
In [128]:
df.loc[(df['no_of_weekend_nights']==0) & (df['no_of_week_nights']==0)]
Out[128]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status

Feature Engineering:

Since 'booking_status' is a flag column indicating whether the booking was canceled or not, we change the 'Not_Canceled' values to 0 and 'Canceled' values to 1 for further analysis .

This step has already been carried before the EDA section.

Creating separate datasets for logistic regression and decision tree model (for easy analysis):

In [129]:
df_log_reg = df.copy()
In [130]:
df_dtree = df.copy()

Exploratory Data Analysis (EDA)

To check if performing outlier treatment and dropping of rows has resulted in any major changes to the dataset.

Statistical Summary of the dataset
In [131]:
df.describe().T
Out[131]:
count mean std min 25% 50% 75% max
no_of_adults 36197.00000 1.84543 0.51864 0.00000 2.00000 2.00000 2.00000 4.00000
no_of_children 36197.00000 0.10479 0.39472 0.00000 0.00000 0.00000 0.00000 3.00000
no_of_weekend_nights 36197.00000 0.81247 0.87077 0.00000 0.00000 1.00000 2.00000 7.00000
no_of_week_nights 36197.00000 2.20905 1.40870 0.00000 1.00000 2.00000 3.00000 17.00000
required_car_parking_space 36197.00000 0.03105 0.17346 0.00000 0.00000 0.00000 0.00000 1.00000
lead_time 36197.00000 85.31395 85.93693 0.00000 17.00000 57.00000 126.00000 443.00000
arrival_year 36197.00000 2017.82068 0.38363 2017.00000 2018.00000 2018.00000 2018.00000 2018.00000
arrival_month 36197.00000 7.42360 3.06860 1.00000 5.00000 8.00000 10.00000 12.00000
arrival_date 36197.00000 15.59801 8.74140 1.00000 8.00000 16.00000 23.00000 31.00000
repeated_guest 36197.00000 0.02558 0.15789 0.00000 0.00000 0.00000 0.00000 1.00000
no_of_previous_cancellations 36197.00000 0.02337 0.36869 0.00000 0.00000 0.00000 0.00000 13.00000
no_of_previous_bookings_not_canceled 36197.00000 0.15347 1.75579 0.00000 0.00000 0.00000 0.00000 58.00000
avg_price_per_room 36197.00000 103.63645 34.72348 0.00000 80.75000 99.60000 120.12000 375.50000
no_of_special_requests 36197.00000 0.61947 0.78637 0.00000 0.00000 0.00000 1.00000 5.00000
booking_status 36197.00000 0.32829 0.46960 0.00000 0.00000 0.00000 1.00000 1.00000

Observations:

  • The number of rows in the dataset has been reduced from 36275 to 36197 due to the dropping of 78 records.
  • The values of mean and standard deviations for the different columns seem to have changed slightly.
  • Following table shows the changes in values for the statistical summary for columns no_of_children, no_of_weekend_nights, no_of_weeknights, and avg_price_per_room :
Variables Value before imputation Value after imputation
mean() : avg_price_per_room 103.42354 103.63645
std() : avg_price_per_room 35.08942 34.72348
25%() : avg_price_per_room 80.30000 80.75000
50%() : avg_price_per_room 99.45000 99.60000
max() : avg_price_per_room 540.00 375.50
mean() : no_of_children 0.10528 0.39472
std() : no_of_children 0.40265 0.39472
max() : no_of_children 10 3
mean() : no_of_weekend_nights 0.81072 0.81247
std() : no_of_weekend_nights 0.87064 0.87077
mean() : no_of_week_nights 2.20430 2.20905
std() : no_of_week_nights 1.41090 1.40870
  • There are no other major changes in the statistical summary of the dataset.

Univariate Analysis

avg_price_per_room
In [132]:
# Histplot and Boxplot to show the distribution of data for the column 'avg_price_per_room':
creating_hist_box(df, 'avg_price_per_room', bins=40)
In [133]:
df['avg_price_per_room'].describe()
Out[133]:
count   36197.00000
mean      103.63645
std        34.72348
min         0.00000
25%        80.75000
50%        99.60000
75%       120.12000
max       375.50000
Name: avg_price_per_room, dtype: float64

Observations:

  • The maximum value of the column has been changed from 540 euros to 375.50 euros.
  • The mean of the column has increased very slightly from 103.42354 euros to 103.63645 euros.
  • The standard deviation of the column has decreased very slightly from 35.08942 euros to 34.72348 euros.
  • The minimumn, 25th percentile and median values of the column has also increased very slightly .
no_of_children
In [134]:
# Barplot for the cloumn 'no_of_children' in the dataset:
labeled_barplot(df, 'no_of_children', perc=True)
In [135]:
df['no_of_children'].value_counts()/df.shape[0] * 100
Out[135]:
0   92.56016
1    4.46170
2    2.91737
3    0.06078
Name: no_of_children, dtype: float64

Observations:

  • There are no high values for the column 'no_of_children'(i.e, 9 and 10) since they have been replaced with 3.
  • There is an increase in percentage of records having 3 children (from 0.05238 to 0.06078).
  • There is no substantial changes in the percentage dstrbution for the column 'no_of_children'.
no_of_week_nights
In [136]:
# Barplot for the cloumn 'no_of_week_nights' in the dataset:
labeled_barplot(df, 'no_of_week_nights', perc=True)
In [137]:
(df['no_of_week_nights'].value_counts()/df.shape[0] * 100).head()
Out[137]:
2   31.61588
1   26.21212
3   21.65649
4    8.26035
0    6.37898
Name: no_of_week_nights, dtype: float64
In [138]:
(df['no_of_week_nights'].value_counts()/df.shape[0] * 100).tail()
Out[138]:
12   0.02486
14   0.01934
13   0.01381
17   0.00829
16   0.00553
Name: no_of_week_nights, dtype: float64

Observations:

  • No major changes have occurred due to dropping of records.
no_of_weekend_nights
In [139]:
# Barplot for the cloumn 'no_of_weekend_nights' in the dataset:
labeled_barplot(df, 'no_of_weekend_nights', perc=True)
In [140]:
(df['no_of_weekend_nights'].value_counts()/df.shape[0] * 100)
Out[140]:
0   46.39611
1   27.61279
2   25.06009
3    0.42269
4    0.35638
5    0.09393
6    0.05525
7    0.00276
Name: no_of_weekend_nights, dtype: float64

Observations:

  • No major changes have occurred due to dropping of records.
booking_status
In [141]:
# Barplot for the cloumn 'booking_status' in the dataset:
labeled_barplot(df, 'booking_status', perc=True)

Observations:

  • No major changes have occurred due to dropping of records.

Multivariate Analysis

Correlation between the variables
In [142]:
cols_list = df.select_dtypes(include=np.number).columns.tolist()

df_corr = df[cols_list].corr()
df_corr
Out[142]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month arrival_date repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
no_of_adults 1.00000 -0.02007 0.10270 0.10462 0.01128 0.09682 0.07647 0.02238 0.02669 -0.19227 -0.04742 -0.11918 0.29793 0.18933 0.08641
no_of_children -0.02007 1.00000 0.02936 0.02445 0.03519 -0.04700 0.04838 -0.00248 0.02641 -0.03725 -0.01664 -0.02153 0.34942 0.12660 0.03381
no_of_weekend_nights 0.10270 0.02936 1.00000 0.17707 -0.03150 0.04580 0.05489 -0.00990 0.02725 -0.06699 -0.02077 -0.02637 -0.01030 0.06092 0.06036
no_of_week_nights 0.10462 0.02445 0.17707 1.00000 -0.04952 0.14871 0.03180 0.03756 -0.00952 -0.09969 -0.03026 -0.04953 0.01326 0.04652 0.09112
required_car_parking_space 0.01128 0.03519 -0.03150 -0.04952 1.00000 -0.06668 0.01559 -0.01553 -0.00006 0.11121 0.02710 0.06382 0.06092 0.08805 -0.08649
lead_time 0.09682 -0.04700 0.04580 0.14871 -0.06668 1.00000 0.14331 0.13696 0.00683 -0.13588 -0.04576 -0.07815 -0.06605 -0.10180 0.43835
arrival_year 0.07647 0.04838 0.05489 0.03180 0.01559 0.14331 1.00000 -0.33916 0.01880 -0.01822 0.00404 0.02654 0.17891 0.05302 0.17957
arrival_month 0.02238 -0.00248 -0.00990 0.03756 -0.01553 0.13696 -0.33916 1.00000 -0.04216 -0.00024 -0.03876 -0.01089 0.05560 0.11081 -0.01125
arrival_date 0.02669 0.02641 0.02725 -0.00952 -0.00006 0.00683 0.01880 -0.04216 1.00000 -0.01617 -0.01246 -0.00149 0.01747 0.01831 0.01065
repeated_guest -0.19227 -0.03725 -0.06699 -0.09969 0.11121 -0.13588 -0.01822 -0.00024 -0.01617 1.00000 0.39124 0.53945 -0.17622 -0.01193 -0.10731
no_of_previous_cancellations -0.04742 -0.01664 -0.02077 -0.03026 0.02710 -0.04576 0.00404 -0.03876 -0.01246 0.39124 1.00000 0.46806 -0.06425 -0.00325 -0.03379
no_of_previous_bookings_not_canceled -0.11918 -0.02153 -0.02637 -0.04953 0.06382 -0.07815 0.02654 -0.01089 -0.00149 0.53945 0.46806 1.00000 -0.11509 0.02745 -0.06023
avg_price_per_room 0.29793 0.34942 -0.01030 0.01326 0.06092 -0.06605 0.17891 0.05560 0.01747 -0.17622 -0.06425 -0.11509 1.00000 0.18762 0.13976
no_of_special_requests 0.18933 0.12660 0.06092 0.04652 0.08805 -0.10180 0.05302 0.11081 0.01831 -0.01193 -0.00325 0.02745 0.18762 1.00000 -0.25319
booking_status 0.08641 0.03381 0.06036 0.09112 -0.08649 0.43835 0.17957 -0.01125 0.01065 -0.10731 -0.03379 -0.06023 0.13976 -0.25319 1.00000
In [143]:
#Heatmap showing correlation values between different variables.
plt.figure(figsize=(12, 7))
sns.heatmap(df_corr, annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()

Observations:

  • Following are the slight changes in the correlation values before and after imputation:
Variables Corr. Value before imputation Corr. Value after imputation
avg_price_per_room & repeated_guest -0.17 -0.18
required_parking_space & no_of_children 0.03 0.04
arrival_year & no_of_weekend_nights 0.06 0.05
avg_price_per_room & no_of_children 0.34 0.35
avg_price_per_room & no_of_weekend_nights -0.00 -0.01
avg_price_per_room & no_of_week_nights 0.02 0.01
avg_price_per_room & lead_time -0.06 -0.07
avg_price_per_room & arrival_month 0.05 0.06
avg_price_per_room & no_of_previous_bookings_not_cancelled -0.11 -0.12
no_of_special_requests & no_of_children 0.12 0.13
no_of_special_requests & avg_price_per_room 0.18 0.19
  • There are no other major changes in the correlation values of the different columns.
market_segment_type vs avg_price_per_room
In [144]:
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x="market_segment_type", y="avg_price_per_room", palette="gist_rainbow")
plt.show()
In [145]:
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x="market_segment_type", y="avg_price_per_room")
plt.grid()
plt.show()
In [146]:
df.groupby(['market_segment_type'])['avg_price_per_room'].mean()
Out[146]:
market_segment_type
Aviation        100.70400
Complementary     3.24981
Corporate        82.91174
Offline          91.59844
Online          112.57206
Name: avg_price_per_room, dtype: float64

Observations:

  • The plots suggest that there are no major changes after outlier treatment.
no_of_special_requests vs avg_price_per_room
In [147]:
plt.figure(figsize=(10, 5))
sns.boxplot(data = df, x='no_of_special_requests', y='avg_price_per_room')  
plt.show()
In [148]:
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x="no_of_special_requests", y="avg_price_per_room", ci= False)
plt.grid()
plt.show()
In [149]:
df.groupby(['no_of_special_requests'])['avg_price_per_room'].mean()
Out[149]:
no_of_special_requests
0    98.59197
1   105.85040
2   118.30274
3   118.47251
4   110.07103
5   118.12500
Name: avg_price_per_room, dtype: float64

Observations:

  • The plots suggest that there are no major changes after outlier treatment.
booking_status vs avg_price_per_room
In [150]:
distribution_plot_wrt_target(df, "avg_price_per_room", "booking_status")
In [151]:
df.loc[df['booking_status'] ==0,'avg_price_per_room'].describe()
Out[151]:
count   24314.00000
mean      100.24377
std        35.48972
min         0.00000
25%        78.00000
50%        95.00000
75%       119.47500
max       375.50000
Name: avg_price_per_room, dtype: float64
In [152]:
df.loc[df['booking_status'] ==1,'avg_price_per_room'].describe()
Out[152]:
count   11883.00000
mean      110.57825
std        31.99983
min         0.00000
25%        89.40000
50%       108.00000
75%       126.36000
max       365.00000
Name: avg_price_per_room, dtype: float64

Observations:

  • The number of records have dropped for both booking_status, canceled and not canceled.
  • No major changes have occurred due to outlier treatment.
Overall Observation on EDA after outlier treatment and dropping of records:
  • No significant changes have been introduced in the dataset by outlier treatment and dropping of records.

Questions

To check if performing outlier treatment and dropping of rows has resulted in any major changes.

Q1. What are the busiest months in the hotel?
In [153]:
labeled_barplot(df, 'arrival_month', perc=True)
In [154]:
df['arrival_month'].value_counts()
Out[154]:
10    5302
9     4603
8     3810
6     3197
12    3012
11    2971
7     2916
4     2731
5     2595
3     2356
2     1695
1     1009
Name: arrival_month, dtype: int64

Observations:

  • No significant changes have occurred due to the dropping of records.
  • Following are the changes that have occurred:
Arrival Month Percentage of reservations before dropping records (%) Percentage of reservations after dropping records (%)
October 14.7 14.6
July 8.0 1.1
Q2. Which market segment do most of the guests come from?
In [155]:
labeled_barplot(df, 'market_segment_type', perc=True, rotatn=90)
In [156]:
market_seg_df = df.groupby(['market_segment_type']).agg(no=('market_segment_type','count')).reset_index()
market_seg_df['percent'] = market_seg_df['no']/df.shape[0] *100
market_seg_df
Out[156]:
market_segment_type no percent
0 Aviation 125 0.34533
1 Complementary 378 1.04429
2 Corporate 2017 5.57228
3 Offline 10528 29.08528
4 Online 23149 63.95281

Observations:

  • No significant changes have occurred due to the dropping of records.
  • Following are the changes in the percentage of guests belonging to different market segments:
Market Type Percentage of reservations before dropping records (%) Percentage of reservations after dropping records (%)
Offline 29.0 29.1
Complementary 1.1 1.0
Q3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
In [157]:
sns.lineplot(data=df, x='market_segment_type', y='avg_price_per_room')
plt.grid()
plt.show()
In [158]:
market_seg_price_df = df.groupby(['market_segment_type']).agg(average_price=('avg_price_per_room','mean')).reset_index()
market_seg_price_df.sort_values(by='average_price', ascending= False)
Out[158]:
market_segment_type average_price
4 Online 112.57206
0 Aviation 100.70400
3 Offline 91.59844
2 Corporate 82.91174
1 Complementary 3.24981

Observations:

  • No significant changes have occurred due to the dropping of records.
  • Following are the changes that have occurred:
Market Type Average Price per Room before dropping records (euros) Average Price per Room after dropping records (euros)
Online 112.26 112.57
Complementary 3.14 3.25
Q4. What percentage of bookings are canceled?
In [159]:
labeled_barplot(df, 'booking_status', perc=True)
In [160]:
df.groupby(['booking_status'])['booking_status'].count()
Out[160]:
booking_status
0    24314
1    11883
Name: booking_status, dtype: int64
In [161]:
df.groupby(['booking_status'])['booking_status'].count()/df.shape[0] *100
Out[161]:
booking_status
0   67.17131
1   32.82869
Name: booking_status, dtype: float64

Observations:

  • No significant changes have occurred due to the dropping of records.
  • Following are the changes that have occurred:
Booking Status Percentage of reservations before dropping records (%) Percentage of reservations after dropping records (%)
Canceled 67.24 67.17
Not Canceled 32.76 32.83
Q5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
In [162]:
plt.figure(figsize=(4,5))
sns.catplot(data=df, x='repeated_guest', hue='booking_status', kind='count')
plt.show()
<Figure size 400x500 with 0 Axes>
In [163]:
guests_df = df.groupby(['repeated_guest','booking_status']).agg(number=('repeated_guest','count')).reset_index()
guests_df
Out[163]:
repeated_guest booking_status number
0 0 0 23404
1 0 1 11867
2 1 0 910
3 1 1 16
In [164]:
repeated_guests_df = guests_df.query('repeated_guest==1')
repeated_guests_df['percent'] = repeated_guests_df['number']/repeated_guests_df['number'].sum() *100
repeated_guests_df
Out[164]:
repeated_guest booking_status number percent
2 1 0 910 98.27214
3 1 1 16 1.72786

Observations:

  • No significant changes have occurred due to the dropping of records.
  • Of the 926 guests who have made repeatedly made reservations:
    • 16 guests have canceled their reservations, which corresponds to around 1.73%.
    • 910 guests have not canceled their reservations, which corresponds to around 98.27%.
Q6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
In [165]:
plt.figure(figsize=(4,5))
sns.catplot(data=df, x='no_of_special_requests', hue='booking_status', kind='count')
plt.show()
<Figure size 400x500 with 0 Axes>
In [166]:
stacked_barplot(df, "no_of_special_requests", "booking_status")
booking_status              0      1    All
no_of_special_requests                     
All                     24314  11883  36197
0                       11200   8543  19743
1                        8636   2703  11339
2                        3718    637   4355
3                         674      0    674
4                          78      0     78
5                           8      0      8
------------------------------------------------------------------------------------------------------------------------
In [167]:
sp_request_df = df.groupby(['no_of_special_requests','booking_status']).agg(number=('no_of_special_requests','count')).reset_index()
sp_request_df
Out[167]:
no_of_special_requests booking_status number
0 0 0 11200
1 0 1 8543
2 1 0 8636
3 1 1 2703
4 2 0 3718
5 2 1 637
6 3 0 674
7 4 0 78
8 5 0 8
In [168]:
special_req_0 = sp_request_df.query('no_of_special_requests == 0')
special_req_0['percent'] = special_req_0['number']/special_req_0['number'].sum() *100
print(special_req_0)
percent_0_req = round(special_req_0.loc[1,'percent'],2)
   no_of_special_requests  booking_status  number  percent
0                       0               0   11200 56.72897
1                       0               1    8543 43.27103
In [169]:
special_req_1 = sp_request_df.query('no_of_special_requests == 1')
special_req_1['percent'] = special_req_1['number']/special_req_1['number'].sum() *100
print(special_req_1)
percent_1_req = round(special_req_1.loc[3,'percent'],2)
   no_of_special_requests  booking_status  number  percent
2                       1               0    8636 76.16192
3                       1               1    2703 23.83808
In [170]:
special_req_2 = sp_request_df.query('no_of_special_requests == 2')
special_req_2['percent'] = special_req_2['number']/special_req_2['number'].sum() *100
print(special_req_2)
percent_2_req = round(special_req_2.loc[5,'percent'],2)
   no_of_special_requests  booking_status  number  percent
4                       2               0    3718 85.37313
5                       2               1     637 14.62687
In [171]:
special_req_3 = sp_request_df.query('no_of_special_requests == 3')
special_req_3['percent'] = special_req_3['number']/special_req_3['number'].sum() *100
print(special_req_3)
   no_of_special_requests  booking_status  number   percent
6                       3               0     674 100.00000
In [172]:
special_req_4 = sp_request_df.query('no_of_special_requests == 4')
special_req_4['percent'] = special_req_4['number']/special_req_4['number'].sum() *100
print(special_req_4)
   no_of_special_requests  booking_status  number   percent
7                       4               0      78 100.00000
In [173]:
special_req_5 = sp_request_df.query('no_of_special_requests == 5')
special_req_5['percent'] = special_req_5['number']/special_req_5['number'].sum() *100
print(special_req_5)
   no_of_special_requests  booking_status  number   percent
8                       5               0       8 100.00000
In [174]:
print(f'Out of the reservations made which had no special requests, {percent_0_req}% of the reservations were cancelled.')
print(f'Out of the reservations made which had 1 special request, {percent_1_req}% of the reservations were cancelled.')
print(f'Out of the reservations made which had 2 special requests, {percent_2_req}% of the reservations were cancelled.')
Out of the reservations made which had no special requests, 43.27% of the reservations were cancelled.
Out of the reservations made which had 1 special request, 23.84% of the reservations were cancelled.
Out of the reservations made which had 2 special requests, 14.63% of the reservations were cancelled.

Observations:

  • No significant changes have occurred due to the dropping of records.
  • Following are the changes that have occurred:
No. of Special Requests Percentage of reservations before dropping records (%) Percentage of reservations after dropping records (%)
0 43.21 43.27
1 23.77 23.84
2 14.60 14.63
Overall Observations:
  • No significant changes have occurred due to outlier treatment and dropping of records.

Model Building and Evaluation:

Model evaluation criterion

Model can make correct predictions as:

  1. True Positive: Predicting a customer will cancel their booking and in reality, the customer cancels their booking.
  2. True Negative: Predicting a customer will not cancel their booking and in reality, the customer does not cancel their booking.

Model can make wrong predictions as:

  1. False Positive: Predicting a customer will cancel their booking but in reality, the customer does not cancel their booking.
  2. False Negative: Predicting a customer will not cancel their booking but in reality, the customer cancels their booking.

Which case is more important?

  • Both the cases are important as:
    • If we predict that a booking will not be canceled and the booking gets canceled then the hotel will lose resources and will have to bear additional costs of distribution channels.
    • If we predict that a booking will get canceled and the booking doesn't get canceled the hotel might not be able to provide satisfactory services to the customer by assuming that this booking will be canceled. This might damage brand equity.

How to reduce the losses?

  • Hotel would want F1 Score to be maximized, the greater the F1 score higher the chances of minimizing False Negatives and False Positives.

Logistic Regression model

Data Preparation for building Logistic Model

In [175]:
# Splitting data into independent(X) and dependent(y) variables:
X = df_log_reg.drop(["booking_status"], axis=1)
Y = df_log_reg["booking_status"]
In [176]:
# Adding constant to independent variable X: 
X = sm.add_constant(X)
In [177]:
# Creating dummy columns: 
X = pd.get_dummies(X,drop_first=True)
In [178]:
X.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36197 entries, 0 to 36274
Data columns (total 28 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   const                                 36197 non-null  float64
 1   no_of_adults                          36197 non-null  int64  
 2   no_of_children                        36197 non-null  int64  
 3   no_of_weekend_nights                  36197 non-null  int64  
 4   no_of_week_nights                     36197 non-null  int64  
 5   required_car_parking_space            36197 non-null  int64  
 6   lead_time                             36197 non-null  int64  
 7   arrival_year                          36197 non-null  int64  
 8   arrival_month                         36197 non-null  int64  
 9   arrival_date                          36197 non-null  int64  
 10  repeated_guest                        36197 non-null  int64  
 11  no_of_previous_cancellations          36197 non-null  int64  
 12  no_of_previous_bookings_not_canceled  36197 non-null  int64  
 13  avg_price_per_room                    36197 non-null  float64
 14  no_of_special_requests                36197 non-null  int64  
 15  type_of_meal_plan_Meal Plan 2         36197 non-null  uint8  
 16  type_of_meal_plan_Meal Plan 3         36197 non-null  uint8  
 17  type_of_meal_plan_Not Selected        36197 non-null  uint8  
 18  room_type_reserved_Room_Type 2        36197 non-null  uint8  
 19  room_type_reserved_Room_Type 3        36197 non-null  uint8  
 20  room_type_reserved_Room_Type 4        36197 non-null  uint8  
 21  room_type_reserved_Room_Type 5        36197 non-null  uint8  
 22  room_type_reserved_Room_Type 6        36197 non-null  uint8  
 23  room_type_reserved_Room_Type 7        36197 non-null  uint8  
 24  market_segment_type_Complementary     36197 non-null  uint8  
 25  market_segment_type_Corporate         36197 non-null  uint8  
 26  market_segment_type_Offline           36197 non-null  uint8  
 27  market_segment_type_Online            36197 non-null  uint8  
dtypes: float64(2), int64(13), uint8(13)
memory usage: 4.9 MB

Observations:

  • After creating dummy columns from the category datatype, the total number of columns in the X dataset is 28 columns.
In [179]:
# Splitting in training and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.30, random_state=1, stratify=Y)
In [180]:
# Train data shape: 
print(f'Indep training df: {X_train.shape}')
print(f'Dep training df: {y_train.shape}')
Indep training df: (25337, 28)
Dep training df: (25337,)
In [181]:
# Test Data shape: 
print(f'Indep test df: {X_test.shape}')
print(f'Dep test df: {y_test.shape}')
Indep test df: (10860, 28)
Dep test df: (10860,)

Observations:

  • No. of rows in the training data = 25337
  • No. of rows in the test data = 10860
  • No. of columns in the independent training data (ie, X_train dataset) = 28 (27 independent variables + 1 constant column)
  • No. of columns in the dependent training data (ie, y_train) = 1
  • No. of columns in the independent test data (ie, X_test dataset) = 28 (27 independent variables + 1 constant column)
  • No. of columns in the dependent test data (ie, y_test) = 1
In [182]:
# Percentage of classes in dataset: 
# booking_status = 0 = Not_Canceled
# booking_status = 1 = Canceled

df_log_reg['booking_status'].value_counts()/df_log_reg.shape[0]
Out[182]:
0   0.67171
1   0.32829
Name: booking_status, dtype: float64
In [183]:
# Percentage of classes in training dataset:
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
Percentage of classes in training set:
0   0.67171
1   0.32829
Name: booking_status, dtype: float64
In [184]:
# Percentage of classes in test dataset:
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in test set:
0   0.67173
1   0.32827
Name: booking_status, dtype: float64

Observations:

  • Around 67.2% of observations belong to class 0 (Booking status = Not_Canceled) and 32.8% of observations belong to class 1 (Booking status = Canceled), and this is preserved in the train and test sets.

Building Logistic Regression Model and Checking Performance

In [185]:
# Buiding Model: 
model_lg_1 = sm.Logit(y_train, X_train.astype(float)).fit(disp=False)
print(model_lg_1.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                25337
Model:                          Logit   Df Residuals:                    25309
Method:                           MLE   Df Model:                           27
Date:                Sat, 03 Jun 2023   Pseudo R-squ.:                  0.3340
Time:                        12:00:05   Log-Likelihood:                -10681.
converged:                      False   LL-Null:                       -16037.
Covariance Type:            nonrobust   LLR p-value:                     0.000
========================================================================================================
                                           coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------
const                                 -922.8929    121.703     -7.583      0.000   -1161.426    -684.360
no_of_adults                             0.0410      0.038      1.083      0.279      -0.033       0.115
no_of_children                           0.1476      0.063      2.341      0.019       0.024       0.271
no_of_weekend_nights                     0.1558      0.020      7.842      0.000       0.117       0.195
no_of_week_nights                        0.0252      0.012      2.054      0.040       0.001       0.049
required_car_parking_space              -1.5645      0.137    -11.432      0.000      -1.833      -1.296
lead_time                                0.0158      0.000     59.087      0.000       0.015       0.016
arrival_year                             0.4563      0.060      7.565      0.000       0.338       0.574
arrival_month                           -0.0411      0.006     -6.333      0.000      -0.054      -0.028
arrival_date                             0.0002      0.002      0.113      0.910      -0.004       0.004
repeated_guest                          -2.1056      0.676     -3.116      0.002      -3.430      -0.781
no_of_previous_cancellations             0.3083      0.089      3.450      0.001       0.133       0.483
no_of_previous_bookings_not_canceled    -0.5332      0.350     -1.522      0.128      -1.220       0.153
avg_price_per_room                       0.0178      0.001     23.902      0.000       0.016       0.019
no_of_special_requests                  -1.5096      0.031    -49.253      0.000      -1.570      -1.449
type_of_meal_plan_Meal Plan 2            0.2083      0.067      3.090      0.002       0.076       0.340
type_of_meal_plan_Meal Plan 3           13.5283    549.256      0.025      0.980   -1062.994    1090.050
type_of_meal_plan_Not Selected           0.2549      0.053      4.807      0.000       0.151       0.359
room_type_reserved_Room_Type 2          -0.4390      0.132     -3.317      0.001      -0.698      -0.180
room_type_reserved_Room_Type 3           0.0002      1.319      0.000      1.000      -2.586       2.586
room_type_reserved_Room_Type 4          -0.1947      0.053     -3.647      0.000      -0.299      -0.090
room_type_reserved_Room_Type 5          -0.8402      0.209     -4.019      0.000      -1.250      -0.430
room_type_reserved_Room_Type 6          -0.9133      0.154     -5.924      0.000      -1.215      -0.611
room_type_reserved_Room_Type 7          -1.0188      0.320     -3.187      0.001      -1.645      -0.392
market_segment_type_Complementary      -23.2076    753.636     -0.031      0.975   -1500.307    1453.891
market_segment_type_Corporate           -1.3168      0.272     -4.847      0.000      -1.849      -0.784
market_segment_type_Offline             -2.2405      0.260     -8.603      0.000      -2.751      -1.730
market_segment_type_Online              -0.4382      0.257     -1.703      0.089      -0.943       0.066
========================================================================================================

Observations:

  • Negative values of the coefficient show that the probability of canceling a reservation decreases with the increase of the corresponding attribute value.
  • Positive values of the coefficient show that the probability of canceling a reservation increases with the increase of the corresponding attribute value.
In [186]:
# Prediction on training set
# Default Threshold is 0.5, if predicted probability is greater than 0.5 the observation will be classified as 1

pred_train = model_lg_1.predict(X_train) > 0.5
pred_train = np.round(pred_train)
pred_train.head()
Out[186]:
33637   1.00000
21234   0.00000
10225   0.00000
24839   0.00000
17406   0.00000
dtype: float16
In [187]:
# Creating confusion matrix for logistic regression model 'model_lg_1' on training data:
# Here, threshold value for classification is 0.5

confusion_matrix_statsmodels(model_lg_1, X_train, y_train)
In [188]:
print("Training performance:")
model_performance_classification_statsmodels(model_lg_1, X_train, y_train)
Training performance:
Out[188]:
Accuracy Recall Precision F1
0 0.80917 0.63753 0.74449 0.68687
In [189]:
# Creating confusion matrix for logistic regression model 'model_lg_1' on testing data:
# Here, threshold value for classification is 0.5

confusion_matrix_statsmodels(model_lg_1, X_test, y_test)
In [190]:
print("Testing performance:")
model_performance_classification_statsmodels(model_lg_1, X_test, y_test)
Testing performance:
Out[190]:
Accuracy Recall Precision F1
0 0.79862 0.62945 0.72154 0.67236

Observation on 'model_lg_1':

  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 5303 20.93
True Negative 15199 59.99
False Positive 1820 7.18
False Negative 3015 11.90
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2244 20.66
True Negative 6429 59.20
False Positive 866 7.97
False Negative 1321 12.16
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.80917
Test 0.79862
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.63753
Test 0.62945
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.68687
Test 0.67236
  • As the train and test performances are comparable, the model is not overfitting.

Checking for Multicollinearity

  • In order to make statistical inferences from a logistic regression model, it is important to ensure that there is no multicollinearity present in the data.
In [191]:
vif = checking_vif(X_train)
vif
Out[191]:
feature VIF
0 const 39445082.60959
1 no_of_adults 1.34589
2 no_of_children 2.06631
3 no_of_weekend_nights 1.06979
4 no_of_week_nights 1.09746
5 required_car_parking_space 1.03634
6 lead_time 1.39504
7 arrival_year 1.42292
8 arrival_month 1.27280
9 arrival_date 1.00806
10 repeated_guest 1.71238
11 no_of_previous_cancellations 1.31325
12 no_of_previous_bookings_not_canceled 1.54870
13 avg_price_per_room 2.06591
14 no_of_special_requests 1.25755
15 type_of_meal_plan_Meal Plan 2 1.27488
16 type_of_meal_plan_Meal Plan 3 1.01864
17 type_of_meal_plan_Not Selected 1.27668
18 room_type_reserved_Room_Type 2 1.09628
19 room_type_reserved_Room_Type 3 1.00069
20 room_type_reserved_Room_Type 4 1.36137
21 room_type_reserved_Room_Type 5 1.03098
22 room_type_reserved_Room_Type 6 2.03918
23 room_type_reserved_Room_Type 7 1.09446
24 market_segment_type_Complementary 4.50306
25 market_segment_type_Corporate 18.60399
26 market_segment_type_Offline 69.79655
27 market_segment_type_Online 77.41418
In [192]:
# Checking which columns have vif>5 :
high_vif_cols = vif.loc[(vif['VIF']>=5) & (vif['feature']!='const')]
column_list = []
for feature in high_vif_cols['feature']:
  column_list.append(feature)

column_list
Out[192]:
['market_segment_type_Corporate',
 'market_segment_type_Offline',
 'market_segment_type_Online']

Observation:

  • None of the variables exhibit high multicollinearity(i.e. vif>5), so the values in the summary are reliable.

Dropping high p-value variables (Removing insignificant variables (p-value > 0.05))

The steps for dropping the high p_value variables are:

  • Build a model, check the p-values of the variables, and drop the column with the highest p-value.
  • Create a new model without the dropped feature, check the p-values of the variables, and drop the column with the highest p-value.
  • Repeat the above two steps till there are no columns with p-value > 0.05.
In [193]:
# Below code automates the above mention three steps for dropping high p_value variables:  

# initial list of columns
cols = X_train.columns.tolist()

# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
    # defining the train set
    x_train_aux = X_train[cols]

    # fitting the model
    model = sm.Logit(y_train, x_train_aux).fit(disp=False)

    # getting the p-values and the maximum p-value
    p_values = model.pvalues
    max_p_value = max(p_values)

    # name of the variable with maximum p-value
    feature_with_p_max = p_values.idxmax()

    if max_p_value > 0.05:
        cols.remove(feature_with_p_max)
    else:
        break

selected_features = cols
print(selected_features)
['const', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']
In [194]:
# X_train2 contains the data from all the columns whose p_value<0.05.
# We will use this to make our final model

X_train2 = X_train[selected_features]
In [195]:
# Rebuilding model (leaving out columns that have p_value >0.05)
model_lg_2 = sm.Logit(y_train, X_train2).fit(disp=False)
print(model_lg_2.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                25337
Model:                          Logit   Df Residuals:                    25316
Method:                           MLE   Df Model:                           20
Date:                Sat, 03 Jun 2023   Pseudo R-squ.:                  0.3327
Time:                        12:00:10   Log-Likelihood:                -10701.
converged:                       True   LL-Null:                       -16037.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==================================================================================================
                                     coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
const                           -907.3391    121.241     -7.484      0.000   -1144.967    -669.711
no_of_children                     0.1346      0.062      2.154      0.031       0.012       0.257
no_of_weekend_nights               0.1591      0.020      8.033      0.000       0.120       0.198
no_of_week_nights                  0.0278      0.012      2.268      0.023       0.004       0.052
required_car_parking_space        -1.5585      0.137    -11.393      0.000      -1.827      -1.290
lead_time                          0.0159      0.000     59.875      0.000       0.015       0.016
arrival_year                       0.4483      0.060      7.461      0.000       0.331       0.566
arrival_month                     -0.0420      0.006     -6.503      0.000      -0.055      -0.029
repeated_guest                    -2.8683      0.579     -4.952      0.000      -4.004      -1.733
no_of_previous_cancellations       0.2602      0.075      3.482      0.000       0.114       0.407
avg_price_per_room                 0.0184      0.001     25.386      0.000       0.017       0.020
no_of_special_requests            -1.5073      0.030    -49.603      0.000      -1.567      -1.448
type_of_meal_plan_Meal Plan 2      0.1964      0.067      2.919      0.004       0.065       0.328
type_of_meal_plan_Not Selected     0.2681      0.053      5.085      0.000       0.165       0.371
room_type_reserved_Room_Type 2    -0.4365      0.132     -3.302      0.001      -0.696      -0.177
room_type_reserved_Room_Type 4    -0.1848      0.052     -3.558      0.000      -0.287      -0.083
room_type_reserved_Room_Type 5    -0.8572      0.209     -4.111      0.000      -1.266      -0.449
room_type_reserved_Room_Type 6    -0.9242      0.154     -6.013      0.000      -1.225      -0.623
room_type_reserved_Room_Type 7    -1.0388      0.318     -3.267      0.001      -1.662      -0.416
market_segment_type_Corporate     -0.8931      0.102     -8.784      0.000      -1.092      -0.694
market_segment_type_Offline       -1.7932      0.052    -34.465      0.000      -1.895      -1.691
==================================================================================================
In [196]:
# Predicting on training set
# Default Threshold is 0.5, if predicted probability is greater than 0.5 the observation will be classified as 1

pred_train = model_lg_2.predict(X_train2) > 0.5
pred_train = np.round(pred_train)
pred_train.head()
Out[196]:
33637   1.00000
21234   0.00000
10225   0.00000
24839   0.00000
17406   0.00000
dtype: float16
In [197]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on training data:
# Here, threshold value for classification is 0.5

confusion_matrix_statsmodels(model_lg_2, X_train2, y_train)
In [198]:
log_reg_model_train_perf = model_performance_classification_statsmodels(model_lg_2, X_train2, y_train)
print("Training performance:")
log_reg_model_train_perf
Training performance:
Out[198]:
Accuracy Recall Precision F1
0 0.80898 0.63621 0.74472 0.68620
In [199]:
X_test2 = X_test[selected_features]
In [200]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on test data:
# Here, threshold value for classification is 0.5

confusion_matrix_statsmodels(model_lg_2, X_test2, y_test)
In [201]:
log_reg_model_test_perf = model_performance_classification_statsmodels(model_lg_2, X_test2, y_test)
print("Testing performance:")
log_reg_model_test_perf
Testing performance:
Out[201]:
Accuracy Recall Precision F1
0 0.79908 0.62973 0.72256 0.67296

Observations on model_lg_2 :

  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 5292 20.89
True Negative 15205 60.01
False Positive 1814 7.16
False Negative 3026 11.94
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2245 20.67
True Negative 6433 59.24
False Positive 862 7.94
False Negative 1320 12.15
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.80898
Test 0.79908
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.63621
Test 0.62973
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.68620
Test 0.67296
  • As the train and test performances are comparable, the model is not overfitting.
  • No significant change in the model performance. However, model_lg_2 is slighly better than model_lg_1 in terms of f1-score for test data.

Model Performance Improvement

Choosing threshold : ROC-AUC
In [202]:
#Plotting ROC-AUC curve for training data:
logit_roc_auc_train = roc_auc_score(y_train, model_lg_2.predict(X_train2))
fpr, tpr, thresholds = roc_curve(y_train, model_lg_2.predict(X_train2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic for training dataset")
plt.legend(loc="lower right")
plt.show()
In [203]:
# Plotting ROC-AUC curve for test data:
logit_roc_auc_train = roc_auc_score(y_test, model_lg_2.predict(X_test2))
fpr, tpr, thresholds = roc_curve(y_test, model_lg_2.predict(X_test2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic for test dataset")
plt.legend(loc="lower right")
plt.show()
In [204]:
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, model_lg_2.predict(X_train2))

optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_idx)
print(optimal_threshold_auc_roc)
3274
0.3823332425853973
In [205]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on train data:
# Here, threshold value for classification  is 0.38 (i.e, optimal_threshold_auc_roc )

confusion_matrix_statsmodels(model_lg_2, X_train2, y_train, threshold=optimal_threshold_auc_roc)
In [206]:
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
    model_lg_2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
Out[206]:
Accuracy Recall Precision F1
0 0.79907 0.72758 0.68176 0.70393
In [207]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on test data:
# Here, threshold value for classification  is 0.38 (i.e, optimal_threshold_auc_roc )

confusion_matrix_statsmodels(model_lg_2, X_test2, y_test, threshold=optimal_threshold_auc_roc)
In [208]:
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
    model_lg_2, X_test2, y_test, threshold=optimal_threshold_auc_roc
)
print("Testing performance:")
log_reg_model_test_perf_threshold_auc_roc
Testing performance:
Out[208]:
Accuracy Recall Precision F1
0 0.79116 0.72539 0.66735 0.69516

Observations for 'model_lg_2' (with threshold = 0.38):

  • Area under the ROC-AUC curve is 0.86.
  • The number for True Positive, True Negative, False Positive, and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 6052 23.89
True Negative 14194 56.02
False Positive 2825 11.15
False Negative 2266 8.94
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2586 23.81
True Negative 6006 55.30
False Positive 1289 11.87
False Negative 979 9.01
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.79907
Test 0.79116
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.72758
Test 0.72539
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.70393
Test 0.69516
  • Model_lg_2 is slightly better when threshold is set to 0.38 compared to when threshold is set to 0.5 in terms of f1-score for test data.
Choosing Threshold using Precision-Recall Curve
In [209]:
# Plotting Precision-Recall Curves:
y_scores = model_lg_2.predict(X_train2)
prec, rec, tre = precision_recall_curve(y_train, y_scores)

def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="precision")
    plt.plot(thresholds, recalls[:-1], "g--", label="recall")
    plt.xlabel("Threshold")
    plt.legend(loc="upper left")
    plt.ylim([0, 1])


plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
In [210]:
# setting the threshold
optimal_threshold_curve1 = 0.37
In [211]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on train data:
# Here, threshold value for classification  is 0.37 (i.e, optimal_threshold_curve1 )

confusion_matrix_statsmodels(model_lg_2, X_train2, y_train, threshold=optimal_threshold_curve1)
In [212]:
log_reg_model_train_perf_threshold_curve1 = model_performance_classification_statsmodels(
    model_lg_2, X_train2, y_train, threshold=optimal_threshold_curve1
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve1
Training performance:
Out[212]:
Accuracy Recall Precision F1
0 0.79370 0.73732 0.66845 0.70119
In [213]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on test data:
# Here, threshold value for classification  is 0.37 (i.e, optimal_threshold_curve1 )

confusion_matrix_statsmodels(model_lg_2, X_test2, y_test, threshold=optimal_threshold_curve1)
In [214]:
log_reg_model_test_perf_threshold_curve1 = model_performance_classification_statsmodels(
    model_lg_2, X_test2, y_test, threshold=optimal_threshold_curve1
)
print("Testing performance:")
log_reg_model_test_perf_threshold_curve1
Testing performance:
Out[214]:
Accuracy Recall Precision F1
0 0.78508 0.73128 0.65453 0.69078

Observations on 'model_lg_2' (with threshold = 0.37):

  • The number for True Positive, True Negative, False Positive, and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 6133 24.21
True Negative 13977 55.16
False Positive 3042 12.01
False Negative 2158 8.62
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2607 24.01
True Negative 5919 54.50
False Positive 1376 12.67
False Negative 958 8.82
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.79370
Test 0.78508
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.73732
Test 0.73128
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.70119
Test 0.69078
  • Model_lg_2 is slighly better when threshold is set to 0.38 compared to when threshold is set to 0.37 in terms of f1-score for test data.
In [215]:
# Setting the optimal threshold from precision-recall curve:
optimal_threshold_curve2 = 0.42
In [216]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on training data:
# Here, threshold value for classification  is 0.42 (i.e, optimal_threshold_curve2 )

confusion_matrix_statsmodels(model_lg_2, X_train2, y_train, threshold=optimal_threshold_curve2)
In [217]:
log_reg_model_train_perf_threshold_curve2 = model_performance_classification_statsmodels(
    model_lg_2, X_train2, y_train, threshold=optimal_threshold_curve2
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve2
Training performance:
Out[217]:
Accuracy Recall Precision F1
0 0.80321 0.70077 0.70010 0.70043
In [218]:
# Creating confusion matrix for logistic regression model 'model_lg_2' on test data:
# Here, threshold value for classification  is 0.42 (i.e, optimal_threshold_curve2 )

confusion_matrix_statsmodels(model_lg_2, X_test2, y_test, threshold=optimal_threshold_curve2)
In [219]:
log_reg_model_test_perf_threshold_curve2 = model_performance_classification_statsmodels(
    model_lg_2, X_test2, y_test, threshold=optimal_threshold_curve2
)
print("Testing performance:")
log_reg_model_test_perf_threshold_curve2
Testing performance:
Out[219]:
Accuracy Recall Precision F1
0 0.79715 0.70182 0.68699 0.69432

Observations:

  • The number for True Positive, True Negative, False Positive, and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 5829 23.01
True Negative 14522 57.32
False Positive 2497 9.86
False Negative 2489 9.82
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2502 23.04
True Negative 6155 56.68
False Positive 1140 10.50
False Negative 1063 9.79
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.80321
Test 0.79715
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.70077
Test 0.70182
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.70043
Test 0.69432
  • Model_lg_2 is slightly better when the threshold is set to 0.38 compared to when the threshold is set to 0.42 in terms of the F1-score for test data.

Model performance evaluation

In [220]:
# training  performance comparison

models_train_comp_df = pd.concat(
    [
        log_reg_model_train_perf.T,
        log_reg_model_train_perf_threshold_curve1.T,
        log_reg_model_train_perf_threshold_auc_roc.T,
        log_reg_model_train_perf_threshold_curve2.T,
    ],
    axis=1,)

models_train_comp_df.columns = [
    "Logistic Regression-default Threshold",
    "Logistic Regression-0.37 Threshold",
    "Logistic Regression-0.38 Threshold",
    "Logistic Regression-0.42 Threshold",
]

print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
Out[220]:
Logistic Regression-default Threshold Logistic Regression-0.37 Threshold Logistic Regression-0.38 Threshold Logistic Regression-0.42 Threshold
Accuracy 0.80898 0.79370 0.79907 0.80321
Recall 0.63621 0.73732 0.72758 0.70077
Precision 0.74472 0.66845 0.68176 0.70010
F1 0.68620 0.70119 0.70393 0.70043
In [221]:
# testing  performance comparison

models_test_comp_df = pd.concat(
    [
        log_reg_model_test_perf.T,
        log_reg_model_test_perf_threshold_curve1.T,
        log_reg_model_test_perf_threshold_auc_roc.T,
        log_reg_model_test_perf_threshold_curve2.T,
    ],
    axis=1,)

models_test_comp_df.columns = [
    "Logistic Regression-default Threshold",
    "Logistic Regression-0.37 Threshold",
    "Logistic Regression-0.38 Threshold",
    "Logistic Regression-0.42 Threshold",
]

print("Testing performance comparison:")
models_test_comp_df
Testing performance comparison:
Out[221]:
Logistic Regression-default Threshold Logistic Regression-0.37 Threshold Logistic Regression-0.38 Threshold Logistic Regression-0.42 Threshold
Accuracy 0.79908 0.78508 0.79116 0.79715
Recall 0.62973 0.73128 0.72539 0.70182
Precision 0.72256 0.65453 0.66735 0.68699
F1 0.67296 0.69078 0.69516 0.69432

Observations:

  • In order to maximise F1-Score : We should use logistic regression model 'model_lg_2' with a threshold value of 0.38. Using this model, we get the highest value of F1- score (of almost 70%) on both training and test data.
  • In order to maximise Recall : We should use logistic regression model 'model_lg_2' with a threshold value of 0.37. Using this model, we get the highest value of recall of almost 73% on test data and 74% on training data.
  • In order to maximise Precision : We should use logistic regression model 'model_lg_2' with default threshold value(0.5). Using this model, we get the highest value of accuracy of almost 72% on test data and 74% on training data.
  • In order to maximise Accuracy : We should use logistic regression model 'model_lg_2' with default threshold value(0.5). Using this model, we get the highest value of accuracy of almost 80% on test data and 81% on training data.

Final Model Summary

Coefficient interpretations
In [222]:
# converting coefficients to odds
odds = np.exp(model_lg_2.params)

# finding the percentage change
perc_change_odds = (np.exp(model_lg_2.params) - 1) * 100

# removing limit from number of columns to display
pd.set_option("display.max_columns", None)

# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train2.columns).T
Out[222]:
const no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month repeated_guest no_of_previous_cancellations avg_price_per_room no_of_special_requests type_of_meal_plan_Meal Plan 2 type_of_meal_plan_Not Selected room_type_reserved_Room_Type 2 room_type_reserved_Room_Type 4 room_type_reserved_Room_Type 5 room_type_reserved_Room_Type 6 room_type_reserved_Room_Type 7 market_segment_type_Corporate market_segment_type_Offline
Odds 0.00000 1.14410 1.17249 1.02815 0.21045 1.01601 1.56570 0.95883 0.05679 1.29724 1.01856 0.22150 1.21701 1.30746 0.64632 0.83130 0.42433 0.39687 0.35388 0.40937 0.16642
Change_odd% -100.00000 14.40979 17.24878 2.81478 -78.95541 1.60051 56.57048 -4.11744 -94.32070 29.72363 1.85596 -77.85023 21.70130 30.74559 -35.36760 -16.86992 -57.56697 -60.31341 -64.61245 -59.06341 -83.35808
In [223]:
# converting coefficients to odds
odds = np.exp(model_lg_2.params)

# finding the percentage change
perc_change_odds = round((np.exp(model_lg_2.params) - 1) * 100,2)

# removing limit from number of columns to display
pd.set_option("display.max_columns", None)

# Adding the odds to a data frame
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train2.columns).T
Out[223]:
const no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month repeated_guest no_of_previous_cancellations avg_price_per_room no_of_special_requests type_of_meal_plan_Meal Plan 2 type_of_meal_plan_Not Selected room_type_reserved_Room_Type 2 room_type_reserved_Room_Type 4 room_type_reserved_Room_Type 5 room_type_reserved_Room_Type 6 room_type_reserved_Room_Type 7 market_segment_type_Corporate market_segment_type_Offline
Odds 0.00000 1.14410 1.17249 1.02815 0.21045 1.01601 1.56570 0.95883 0.05679 1.29724 1.01856 0.22150 1.21701 1.30746 0.64632 0.83130 0.42433 0.39687 0.35388 0.40937 0.16642
Change_odd% -100.00000 14.41000 17.25000 2.81000 -78.96000 1.60000 56.57000 -4.12000 -94.32000 29.72000 1.86000 -77.85000 21.70000 30.75000 -35.37000 -16.87000 -57.57000 -60.31000 -64.61000 -59.06000 -83.36000

Coefficient interpretations:

  • constant coefficient:The value for the constant coefficient (const) is -907.3391 which corresponds to -100% change_odds.
  • no_of_children: Holding all other features constant, a unit change in the no_of_children will increase the odds of cancelation of a reservation by 1.14410 times or a 14.41% increase in the odds of canceling a reservation.
  • no_of_weekend_nights: Holding all other features constant, a unit change in no_of_weekend_nights will increase the odds of cancelation of a reservation by 1.17249 times or a 17.25% increase in the odds of canceling a reservation.
  • no_of_week_nights: Holding all other features constant, a unit change in no_of_week_nights will increase the odds of cancelation of a reservation by 1.02815 times or a 2.81% increase in the odds of canceling a reservation.
  • required_car_parking_space: Holding all other features constant, a unit change in requires_car_parking_space will increase the odds of cancelation of a reservation by 0.21045 times or a 78.96% decrease in the odds of canceling a reservation.
  • lead_time: Holding all other features constant, a unit change in lead_time will increase the odds of cancelation of a reservation by 1.01601 times or a 1.6% increase in the odds of canceling a reservation.
  • arrival_year: Holding all other features constant, a unit change in arrival_year will increase the odds of cancelation of a reservation by 1.56570 times or a 56.57% increase in the odds of canceling a reservation.
  • arrival_month: Holding all other features constant, a unit change in arrival_month will increase the odds of cancelation of a reservation by 0.95883 times or a 4.12% decrease in the odds of canceling a reservation.
  • repeated_guest: Holding all other features constant, a unit change in repeated_guest will increase the odds of cancelation of a reservation by 0.05679 times or a 94.32% decrease in the odds of canceling a reservation.
  • no_of_previous_cancellations: Holding all other features constant, a unit change in no_of_previous_cancellations will increase the odds of cancelation of a reservation by 1.29724 times or a 29.72% increase in the odds of canceling a reservation.
  • avg_price_per_room: Holding all other features constant, a unit change in avg_price_per_room will increase the odds of cancelation of a reservation by 1.01856 times or a 1.86% increase in the odds of canceling a reservation.
  • no_of_special_requests: Holding all other features constant, a unit change in no_of_special_requests will increase the odds of cancelation of a reservation by 0.22150 times or a 77.85% decrease in the odds of canceling a reservation.

Building a Decision Tree model

Data Preparation for building Decision Tree Model

In [224]:
# Converting columns with 'object' datatypes to categorical columns:
for feature in df_dtree.columns:
    if df_dtree[feature].dtype == 'object': 
        df_dtree[feature] = pd.Categorical(df_dtree[feature])# Replace strings with an integer
In [225]:
df_dtree['type_of_meal_plan'].dtype
Out[225]:
CategoricalDtype(categories=['Meal Plan 1', 'Meal Plan 2', 'Meal Plan 3', 'Not Selected'], ordered=False)
In [226]:
df_dtree['room_type_reserved'].dtype
Out[226]:
CategoricalDtype(categories=['Room_Type 1', 'Room_Type 2', 'Room_Type 3', 'Room_Type 4',
                  'Room_Type 5', 'Room_Type 6', 'Room_Type 7'],
, ordered=False)
In [227]:
df_dtree['market_segment_type'].dtype
Out[227]:
CategoricalDtype(categories=['Aviation', 'Complementary', 'Corporate', 'Offline',
                  'Online'],
, ordered=False)
In [228]:
# One Hot Encoding columns 'room_type_reserved', 'market_segment_type' and 'type_of_meal_plan'.
oneHotCols=['room_type_reserved', 'market_segment_type','type_of_meal_plan']
In [229]:
# Creating dummy columns: 
df_dtree = pd.get_dummies(df_dtree, columns=oneHotCols, drop_first=True)

df_dtree.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36197 entries, 0 to 36274
Data columns (total 28 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   no_of_adults                          36197 non-null  int64  
 1   no_of_children                        36197 non-null  int64  
 2   no_of_weekend_nights                  36197 non-null  int64  
 3   no_of_week_nights                     36197 non-null  int64  
 4   required_car_parking_space            36197 non-null  int64  
 5   lead_time                             36197 non-null  int64  
 6   arrival_year                          36197 non-null  int64  
 7   arrival_month                         36197 non-null  int64  
 8   arrival_date                          36197 non-null  int64  
 9   repeated_guest                        36197 non-null  int64  
 10  no_of_previous_cancellations          36197 non-null  int64  
 11  no_of_previous_bookings_not_canceled  36197 non-null  int64  
 12  avg_price_per_room                    36197 non-null  float64
 13  no_of_special_requests                36197 non-null  int64  
 14  booking_status                        36197 non-null  int64  
 15  room_type_reserved_Room_Type 2        36197 non-null  uint8  
 16  room_type_reserved_Room_Type 3        36197 non-null  uint8  
 17  room_type_reserved_Room_Type 4        36197 non-null  uint8  
 18  room_type_reserved_Room_Type 5        36197 non-null  uint8  
 19  room_type_reserved_Room_Type 6        36197 non-null  uint8  
 20  room_type_reserved_Room_Type 7        36197 non-null  uint8  
 21  market_segment_type_Complementary     36197 non-null  uint8  
 22  market_segment_type_Corporate         36197 non-null  uint8  
 23  market_segment_type_Offline           36197 non-null  uint8  
 24  market_segment_type_Online            36197 non-null  uint8  
 25  type_of_meal_plan_Meal Plan 2         36197 non-null  uint8  
 26  type_of_meal_plan_Meal Plan 3         36197 non-null  uint8  
 27  type_of_meal_plan_Not Selected        36197 non-null  uint8  
dtypes: float64(1), int64(14), uint8(13)
memory usage: 4.9 MB
In [230]:
# Splitting data into independent(X) and dependent(y) variables:
X = df_dtree.drop("booking_status" , axis=1)
y = df_dtree['booking_status']
In [231]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1)
In [232]:
# Train data shape: 
print(f'Indep training df: {X_train.shape}')
print(f'Dep training df: {y_train.shape}')
Indep training df: (25337, 27)
Dep training df: (25337,)
In [233]:
# Test Data shape: 
print(f'Indep test df: {X_test.shape}')
print(f'Dep test df: {y_test.shape}')
Indep test df: (10860, 27)
Dep test df: (10860,)

Observations:

  • No. of rows in the training data = 25337
  • No. of rows in the test data = 10860
  • No. of columns in the independent training data (ie, X_train dataset) = 27
  • No. of columns in the dependent training data (ie, y_train) = 1
  • No. of columns in the independent test data (ie, X_test dataset) = 27
  • No. of columns in the dependent test data (ie, y_test) = 1
In [234]:
# Percentage of classes in dataset: 
# booking_status = 0 = Not_Canceled
# booking_status = 1 = Canceled

df_dtree['booking_status'].value_counts()/df_dtree.shape[0]
Out[234]:
0   0.67171
1   0.32829
Name: booking_status, dtype: float64
In [235]:
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
Percentage of classes in training set:
0   0.67368
1   0.32632
Name: booking_status, dtype: float64
In [236]:
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in test set:
0   0.66713
1   0.33287
Name: booking_status, dtype: float64

Observations:

Around 67% of observations belong to class 0 (Booking status = Not_Canceled) and 33% of observations belong to class 1 (Booking status = Canceled), and this is preserved in the train and test sets.

Building Decision Tree Model

In [237]:
# Building decision tree model : 
model_dtree1 = DecisionTreeClassifier(random_state=1)
model_dtree1.fit(X_train, y_train)
Out[237]:
DecisionTreeClassifier(random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [238]:
print('Accuracy on training set : ', model_dtree1.score(X_train, y_train))
print('Accuracy on test set : ', model_dtree1.score(X_test, y_test))
Accuracy on training set :  0.994198208154083
Accuracy on test set :  0.8667587476979742
In [239]:
# Confusion matrix for model 'model_dtree1' on training set:
confusion_matrix_sklearn(model_dtree1, X_train, y_train)
In [240]:
dtree_perf_train = model_performance_classification_sklearn(model_dtree1, X_train, y_train)
print('Training data performance:')
dtree_perf_train
Training data performance:
Out[240]:
Accuracy Recall Precision F1
0 0.99420 0.98670 0.99549 0.99107
In [241]:
# Confusion matrix for model 'model_dtree1' on testing set:
confusion_matrix_sklearn(model_dtree1, X_test, y_test)
In [242]:
dtree_perf_test = model_performance_classification_sklearn(model_dtree1, X_test, y_test)
print('Testing data performance:')
dtree_perf_test
Testing data performance:
Out[242]:
Accuracy Recall Precision F1
0 0.86676 0.79751 0.80128 0.79939

Observations on model_dtree1:

  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 8158 32.20
True Negative 17032 67.22
False Positive 37 0.15
False Negative 110 0.43
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2883 26.55
True Negative 6350 60.13
False Positive 715 6.58
False Negative 732 6.74
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.99420
Test 0.86676
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.98670
Test 0.79751
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.99107
Test 0.79939
  • The model is able to perfectly classify all the data points on the training set as it is allowed to grow to its full size. Since the accuracy of the model is almost 99.42% on the training set , we can say that each sample has been classified correctly.
  • Since, accuracy of the model is 99.42% on the train data and accuracy of the model is 86.68% on the test data, we can also say that this model is overfitting the training data.
In [243]:
# Printing the features: 
feature_names = list(X.columns)
print(feature_names)
['no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected']
In [244]:
# Text report showing the rules of the decision tree model 'model_dtree1':
print(tree.export_text(model_dtree1,feature_names=feature_names,show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- avg_price_per_room <= 201.50
|   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 88.50
|   |   |   |   |   |   |   |   |--- lead_time <= 16.50
|   |   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [128.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  16.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 21.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 13.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  13.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- lead_time >  21.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_previous_bookings_not_canceled <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- no_of_previous_bookings_not_canceled >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  88.50
|   |   |   |   |   |   |   |   |--- lead_time <= 86.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 27.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  27.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- lead_time >  86.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 178.44
|   |   |   |   |   |   |   |   |--- weights: [1666.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  178.44
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 179.78
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  179.78
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 59.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  59.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [10.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 40.83
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  40.83
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 11.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 99.98
|   |   |   |   |   |   |   |   |--- lead_time <= 82.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.38
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.38
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [21.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  82.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 88.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [31.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  88.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [8.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- avg_price_per_room >  99.98
|   |   |   |   |   |   |   |   |--- lead_time <= 81.00
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 123.25
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 68.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  68.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  123.25
|   |   |   |   |   |   |   |   |   |   |--- weights: [7.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  81.00
|   |   |   |   |   |   |   |   |   |--- lead_time <= 88.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [13.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  88.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  201.50
|   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |--- weights: [0.00, 16.00] class: 1
|   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- avg_price_per_room <= 93.58
|   |   |   |   |   |   |--- arrival_date <= 6.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 73.75
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  73.75
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.38
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 109.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 53.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  109.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.38
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 5.50
|   |   |   |   |   |   |   |   |   |--- weights: [35.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  5.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |--- arrival_date >  6.50
|   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.12
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 7.00] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.12
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 82.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [30.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  82.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 73.62
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  73.62
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [17.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |--- avg_price_per_room >  93.58
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- arrival_date <= 9.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 115.00
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  115.00
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 1.00] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  9.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 117.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 57.00] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  117.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 9.00] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 101.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  101.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 101.88
|   |   |   |   |   |   |   |   |--- arrival_date <= 14.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 100.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [10.00, 5.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  100.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [14.00, 1.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  14.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 111.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 94.75
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  94.75
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 47.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  111.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  101.88
|   |   |   |   |   |   |   |   |--- lead_time <= 104.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 101.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 12.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  101.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [50.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  104.00
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 18.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |--- avg_price_per_room <= 92.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 85.38
|   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  85.38
|   |   |   |   |   |   |   |   |--- weights: [7.00, 3.00] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  92.50
|   |   |   |   |   |   |   |--- weights: [0.00, 18.00] class: 1
|   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- arrival_date <= 27.50
|   |   |   |   |   |   |   |   |--- weights: [123.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  27.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 28.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 1.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  28.50
|   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [49.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 140.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  140.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 25.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |--- arrival_date >  25.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [13.00, 1.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [10.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 125.00
|   |   |   |   |   |   |   |   |   |--- lead_time <= 149.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [61.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- lead_time >  149.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 77.75
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  77.75
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  125.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 13.50
|   |   |   |   |--- lead_time <= 2.50
|   |   |   |   |   |--- avg_price_per_room <= 202.67
|   |   |   |   |   |   |--- avg_price_per_room <= 61.36
|   |   |   |   |   |   |   |--- avg_price_per_room <= 59.97
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [26.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  10.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  59.97
|   |   |   |   |   |   |   |   |--- weights: [0.00, 10.00] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  61.36
|   |   |   |   |   |   |   |--- no_of_week_nights <= 7.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 9.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [55.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  3.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 139.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  139.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [28.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  9.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 134.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  134.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 175.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  175.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |--- no_of_week_nights >  7.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |--- avg_price_per_room >  202.67
|   |   |   |   |   |   |--- arrival_date <= 26.00
|   |   |   |   |   |   |   |--- weights: [0.00, 7.00] class: 1
|   |   |   |   |   |   |--- arrival_date >  26.00
|   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |--- lead_time >  2.50
|   |   |   |   |   |--- avg_price_per_room <= 99.38
|   |   |   |   |   |   |--- avg_price_per_room <= 78.81
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 4.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 7.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [81.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  7.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [28.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  4.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  78.81
|   |   |   |   |   |   |   |--- avg_price_per_room <= 79.17
|   |   |   |   |   |   |   |   |--- lead_time <= 5.00
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  5.00
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 10.00] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  79.17
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [23.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [47.00, 0.00] class: 0
|   |   |   |   |   |--- avg_price_per_room >  99.38
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 191.38
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 20
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  191.38
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 9.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 164.33
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  164.33
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 12.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  12.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 123.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  123.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 138.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 9.00] class: 1
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  138.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- arrival_date <= 14.50
|   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [20.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  9.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 128.55
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  128.55
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [12.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  14.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 223.60
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [45.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [9.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  223.60
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |--- lead_time >  13.50
|   |   |   |   |--- avg_price_per_room <= 111.76
|   |   |   |   |   |--- lead_time <= 27.50
|   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [30.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 70.34
|   |   |   |   |   |   |   |   |   |--- lead_time <= 15.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  15.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  70.34
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 101.60
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  101.60
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 1.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  1.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 96.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  96.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [44.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 94.05
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  94.05
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 11.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [44.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |--- lead_time >  27.50
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 72.57
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 30.17
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [17.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  30.17
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  72.57
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 20
|   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [24.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 67.50
|   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  67.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 18.12
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  18.12
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 46.00] class: 1
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 61.87
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 46.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  46.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [20.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  61.87
|   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 9.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  9.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  111.76
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 195.12
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 193.20
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 20
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  193.20
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  195.12
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 82.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 29.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  29.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 95.00
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [13.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- lead_time >  95.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- no_of_week_nights <= 8.00
|   |   |   |   |   |   |   |--- weights: [29.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  8.00
|   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- lead_time <= 91.50
|   |   |   |   |   |--- no_of_weekend_nights <= 2.50
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 129.50
|   |   |   |   |   |   |   |   |--- weights: [871.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  129.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 131.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 63.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  63.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  131.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [28.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 6.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  6.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [14.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 43.00
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 1.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  43.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |--- no_of_weekend_nights >  2.50
|   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |--- lead_time >  91.50
|   |   |   |   |   |--- arrival_date <= 13.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- lead_time <= 108.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 4.00] class: 1
|   |   |   |   |   |   |   |--- lead_time >  108.50
|   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- lead_time <= 146.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 92.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  92.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- lead_time >  146.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |--- arrival_date >  13.50
|   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- lead_time <= 142.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 28.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [26.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  28.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 106.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  106.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  142.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- weights: [59.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.00
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.00
|   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 9.50
|   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |--- avg_price_per_room <= 241.00
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.90
|   |   |   |   |   |   |   |   |   |   |--- weights: [85.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.90
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 85.44
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  85.44
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 19.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  19.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 23.00
|   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  23.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  241.00
|   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 243.40
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  243.40
|   |   |   |   |   |   |   |   |   |--- weights: [5.00, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 12.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- arrival_date >  12.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 198.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  198.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  1.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 12
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [19.00, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 138.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 29.50
|   |   |   |   |   |   |   |   |   |--- weights: [104.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  29.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  138.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  5.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [13.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |--- lead_time >  9.50
|   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 118.55
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [85.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 29
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 19
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 100.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  100.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |--- avg_price_per_room >  118.55
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 17
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 12
|   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 23
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 98.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [74.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  98.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |--- weights: [178.00, 0.00] class: 0
|   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |--- arrival_date <= 21.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 90.10
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  90.10
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 18.00] class: 1
|   |   |   |   |   |   |--- arrival_date >  21.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 10.50
|   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  10.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 89.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [2132.00, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 8.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [29.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  8.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 30.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  30.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- weights: [44.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [64.00, 0.00] class: 0
|   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |--- lead_time >  89.50
|   |   |   |   |--- avg_price_per_room <= 202.14
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 21.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  21.00
|   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 21.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 101.00
|   |   |   |   |   |   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  101.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [8.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  21.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- lead_time <= 150.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 141.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |--- lead_time >  141.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 144.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  144.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |--- lead_time >  150.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 4.00] class: 1
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 10.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  3.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  10.50
|   |   |   |   |   |   |   |   |   |--- weights: [19.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 90.78
|   |   |   |   |   |   |   |   |   |--- lead_time <= 107.00
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 83.05
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  83.05
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- lead_time >  107.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  90.78
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 130.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  130.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.00] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [41.00, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  202.14
|   |   |   |   |   |--- weights: [0.00, 9.00] class: 1
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- arrival_date <= 7.00
|   |   |   |   |   |   |   |--- weights: [0.00, 17.00] class: 1
|   |   |   |   |   |   |--- arrival_date >  7.00
|   |   |   |   |   |   |   |--- lead_time <= 161.50
|   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  161.50
|   |   |   |   |   |   |   |   |--- weights: [1.00, 1.00] class: 0
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- lead_time <= 347.50
|   |   |   |   |   |   |   |--- lead_time <= 173.00
|   |   |   |   |   |   |   |   |--- arrival_date <= 23.00
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_date >  3.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 168.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  168.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  23.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 8.00] class: 1
|   |   |   |   |   |   |   |--- lead_time >  173.00
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 98.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 88.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  88.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 55.21
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  55.21
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  98.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 5.00] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  347.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 88.00
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 10.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  88.00
|   |   |   |   |   |   |   |   |--- arrival_date <= 18.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 93.33
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.00, 2.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  93.33
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  18.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 2.00] class: 0
|   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |--- avg_price_per_room <= 84.62
|   |   |   |   |   |   |--- lead_time <= 244.00
|   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 166.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  166.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 69.34
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  69.34
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 39.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [25.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 66.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 16.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 8.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  16.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 29.57
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  29.57
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  66.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  4.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |--- lead_time >  244.00
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- weights: [39.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 76.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  76.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 19.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  19.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [26.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [31.00, 0.00] class: 0
|   |   |   |   |   |--- avg_price_per_room >  84.62
|   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |--- lead_time <= 316.00
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  2.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  316.00
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 89.00
|   |   |   |   |   |   |   |   |   |--- weights: [7.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  89.00
|   |   |   |   |   |   |   |   |   |--- weights: [1.00, 4.00] class: 1
|   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |--- weights: [8.00, 0.00] class: 0
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- avg_price_per_room <= 7.00
|   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |--- weights: [5.00, 0.00] class: 0
|   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [1.00, 1.00] class: 0
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |--- avg_price_per_room >  7.00
|   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |--- weights: [0.00, 527.00] class: 1
|   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |--- lead_time <= 225.00
|   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  225.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.00] class: 1
|   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 11.00] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 54.00] class: 1
|   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- arrival_date <= 22.50
|   |   |   |   |   |   |--- weights: [43.00, 0.00] class: 0
|   |   |   |   |   |--- arrival_date >  22.50
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 87.30
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  87.30
|   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 83.12
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.42
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.42
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  83.12
|   |   |   |   |   |   |   |   |   |   |--- weights: [12.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |--- lead_time <= 356.00
|   |   |   |   |   |   |   |   |--- lead_time <= 302.50
|   |   |   |   |   |   |   |   |   |--- weights: [16.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  302.50
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.00, 1.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  356.00
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 109.00] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 299.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  6.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.00] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  299.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [13.00, 0.00] class: 0
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |--- lead_time <= 348.50
|   |   |   |   |   |   |   |--- weights: [146.00, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  348.50
|   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 58.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  58.50
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 2.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.00, 2.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  2.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |--- lead_time <= 190.50
|   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |--- lead_time >  190.50
|   |   |   |   |   |   |   |--- weights: [1.00, 0.00] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- no_of_week_nights <= 9.50
|   |   |   |   |   |   |--- lead_time <= 336.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 96.56
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 76.21
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 245.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [50.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  245.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  76.21
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  96.56
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 8.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [12.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 99.33
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  99.33
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 273.00
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 55.92
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  55.92
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 198.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  198.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |--- lead_time >  273.00
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 70.89
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 69.31
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  69.31
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  70.89
|   |   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  336.50
|   |   |   |   |   |   |   |--- weights: [0.00, 5.00] class: 1
|   |   |   |   |   |--- no_of_week_nights >  9.50
|   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 2104.00] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [36.00, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [57.00, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 8.00
|   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |--- arrival_date >  8.00
|   |   |   |   |   |--- lead_time <= 168.00
|   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  168.00
|   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |--- no_of_children <= 1.00
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.00] class: 1
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [4.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_children >  1.00
|   |   |   |   |   |   |   |   |--- weights: [0.00, 6.00] class: 1
|   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |--- weights: [0.00, 9.00] class: 1

In [245]:
# Checking important features in 'model_dtree1': 
feature_names = list(X_train.columns)
importances = model_dtree1.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
In [246]:
# Importance of features of model 'model_dtree1'
print (pd.DataFrame(model_dtree1.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                         Imp
lead_time                            0.35218
avg_price_per_room                   0.17036
market_segment_type_Online           0.09570
arrival_date                         0.08119
no_of_special_requests               0.07139
arrival_month                        0.06212
no_of_week_nights                    0.05013
no_of_weekend_nights                 0.03515
no_of_adults                         0.02907
arrival_year                         0.01138
type_of_meal_plan_Not Selected       0.00930
room_type_reserved_Room_Type 4       0.00840
required_car_parking_space           0.00631
no_of_children                       0.00513
type_of_meal_plan_Meal Plan 2        0.00503
room_type_reserved_Room_Type 5       0.00189
room_type_reserved_Room_Type 2       0.00145
market_segment_type_Offline          0.00140
market_segment_type_Corporate        0.00087
room_type_reserved_Room_Type 6       0.00049
repeated_guest                       0.00026
room_type_reserved_Room_Type 7       0.00023
market_segment_type_Complementary    0.00022
no_of_previous_bookings_not_canceled 0.00019
no_of_previous_cancellations         0.00015
room_type_reserved_Room_Type 3       0.00000
type_of_meal_plan_Meal Plan 3        0.00000

Observation:

  • The model is quite complex as seen above.
  • The 5 most important features are lead_time, avg_price_per_room, market_segment_type_online, arrival_date and no_of_apecial_requests.

Pre-Pruning

In [247]:
# Creating decision tree model using pre-pruning: 
# Here, max_depth is set to 3
model_dtree4 = DecisionTreeClassifier(max_depth =3, random_state=1, class_weight="balanced")
model_dtree4.fit(X_train, y_train)
Out[247]:
DecisionTreeClassifier(class_weight='balanced', max_depth=3, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [248]:
# Confusion matrix for model 'model_dtree4' on training data: 
confusion_matrix_sklearn(model_dtree4, X_train, y_train)
In [249]:
dtree_tune3_perf_train = model_performance_classification_sklearn(model_dtree4, X_train, y_train)
print('Training data performance:')
dtree_tune3_perf_train
Training data performance:
Out[249]:
Accuracy Recall Precision F1
0 0.78786 0.73875 0.65515 0.69445
In [250]:
# Confusion matrix for model 'model_dtree4' on test data: 
confusion_matrix_sklearn(model_dtree4, X_test, y_test)
In [251]:
dtree_tune3_perf_test = model_performance_classification_sklearn(model_dtree4, X_test, y_test)
print('Testing data performance:')
dtree_tune3_perf_test
Testing data performance:
Out[251]:
Accuracy Recall Precision F1
0 0.78425 0.72172 0.66118 0.69012
In [252]:
# Visualizing the Decision Tree 'model_dtree4':
plt.figure(figsize=(20, 10))
out = tree.plot_tree(model_dtree4,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [253]:
# Text report showing the rules of the decision tree model 'model_dtree4':
print(tree.export_text(model_dtree4,feature_names=feature_names,show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- weights: [3447.49, 1182.88] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- weights: [1850.29, 4202.91] class: 1
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- weights: [4174.10, 1518.44] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- weights: [2164.24, 239.03] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- weights: [509.14, 1932.15] class: 1
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- weights: [446.06, 344.75] class: 0
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- weights: [26.72, 3223.82] class: 1
|   |   |--- arrival_month >  11.50
|   |   |   |--- weights: [50.47, 24.52] class: 0

In [254]:
#Checking important features in 'model_dtree4': 
feature_names = list(X_train.columns)
importances = model_dtree4.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
In [255]:
# Importance of features of model 'model_dtree4':
print (pd.DataFrame(model_dtree4.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                         Imp
lead_time                            0.47199
market_segment_type_Online           0.22971
no_of_special_requests               0.22831
avg_price_per_room                   0.05526
arrival_month                        0.01473
room_type_reserved_Room_Type 3       0.00000
type_of_meal_plan_Meal Plan 3        0.00000
type_of_meal_plan_Meal Plan 2        0.00000
market_segment_type_Offline          0.00000
market_segment_type_Corporate        0.00000
market_segment_type_Complementary    0.00000
room_type_reserved_Room_Type 7       0.00000
room_type_reserved_Room_Type 6       0.00000
room_type_reserved_Room_Type 5       0.00000
room_type_reserved_Room_Type 4       0.00000
no_of_adults                         0.00000
room_type_reserved_Room_Type 2       0.00000
no_of_children                       0.00000
no_of_previous_bookings_not_canceled 0.00000
no_of_previous_cancellations         0.00000
repeated_guest                       0.00000
arrival_date                         0.00000
arrival_year                         0.00000
required_car_parking_space           0.00000
no_of_week_nights                    0.00000
no_of_weekend_nights                 0.00000
type_of_meal_plan_Not Selected       0.00000

Observations on 'model_dtree4':

  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 6108 24.11
True Negative 13854 54.68
False Positive 3215 12.69
False Negative 2160 8.53
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2609 24.02
True Negative 5908 54.40
False Positive 1377 12.31
False Negative 1006 9.26
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.78786
Test 0.78425
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.73875
Test 0.72172
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.69445
Test 0.69012
  • The most important important features for this model are 'lead_time', 'market_segment_type_Online', 'no_of_special_requests', 'avg_price_per_room' and 'arrival month'.
  • Since, the max depth parameter of the tree is set to 3, it is a much simpler model than model_dtree1, which grows out to it full depth.
  • As the accuracy of the model on train data set is much smaller compared to model_dtree1, model_dtree4 does much better in terms of not fitting the noise.
  • The model is giving a generalized result since the f1-scores on both the train and test data are close which shows that the model is able to generalize well on unseen data.
  • However, the F1-score (almost 69%) is also less as compared to model_dtree1(almost 70.94%) on test data.
In [256]:
# Creating decision tree model using pre-pruning: 
# Here, max_depth is set to 5
model_dtree5 = DecisionTreeClassifier(max_depth =5, random_state=1, class_weight="balanced")
model_dtree5.fit(X_train, y_train)
Out[256]:
DecisionTreeClassifier(class_weight='balanced', max_depth=5, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [257]:
# Confusion matrix for model 'model_dtree5' on training data: 
confusion_matrix_sklearn(model_dtree5, X_train, y_train)
In [258]:
dtree_tune5_perf_train = model_performance_classification_sklearn(model_dtree5, X_train, y_train)
print('Training data performance:')
dtree_tune5_perf_train
Training data performance:
Out[258]:
Accuracy Recall Precision F1
0 0.83325 0.74698 0.74329 0.74513
In [259]:
# Confusion matrix for model 'model_dtree5' on test data: 
confusion_matrix_sklearn(model_dtree5, X_test, y_test)
In [260]:
dtree_tune5_perf_test = model_performance_classification_sklearn(model_dtree5, X_test, y_test)
print('Testing data performance:')
dtree_tune5_perf_test
Testing data performance:
Out[260]:
Accuracy Recall Precision F1
0 0.83011 0.73389 0.75028 0.74199
In [261]:
# Visualizing the Decision Tree 'model_dtree5':
plt.figure(figsize=(20, 10))
out = tree.plot_tree(model_dtree5,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [262]:
# Text report showing the rules of the decision tree model 'model_dtree5':
print(tree.export_text(model_dtree5,feature_names=feature_names,show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- weights: [1770.13, 157.82] class: 0
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- weights: [1077.66, 384.59] class: 0
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- weights: [302.81, 511.77] class: 1
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- weights: [296.88, 128.71] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 13.50
|   |   |   |   |--- lead_time <= 2.50
|   |   |   |   |   |--- weights: [412.66, 119.51] class: 0
|   |   |   |   |--- lead_time >  2.50
|   |   |   |   |   |--- weights: [403.75, 402.98] class: 0
|   |   |   |--- lead_time >  13.50
|   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |--- weights: [984.89, 3678.89] class: 1
|   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |--- weights: [48.98, 1.53] class: 0
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- lead_time <= 91.50
|   |   |   |   |   |--- weights: [684.30, 7.66] class: 0
|   |   |   |   |--- lead_time >  91.50
|   |   |   |   |   |--- weights: [116.52, 26.05] class: 0
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- weights: [637.54, 72.01] class: 0
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- weights: [2735.73, 1412.72] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 89.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1582.36, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- weights: [231.56, 75.08] class: 0
|   |   |   |--- lead_time >  89.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- weights: [291.68, 163.95] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [58.63, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |--- weights: [7.42, 101.13] class: 1
|   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |--- weights: [251.60, 98.06] class: 0
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- avg_price_per_room <= 82.47
|   |   |   |   |   |--- weights: [222.66, 695.63] class: 1
|   |   |   |   |--- avg_price_per_room >  82.47
|   |   |   |   |   |--- weights: [27.46, 1037.32] class: 1
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- weights: [46.02, 9.19] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- weights: [32.66, 190.00] class: 1
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- weights: [115.78, 4.60] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- weights: [251.60, 140.97] class: 0
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 3223.82] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- arrival_date <= 1.50
|   |   |   |   |   |--- weights: [0.74, 0.00] class: 0
|   |   |   |   |--- arrival_date >  1.50
|   |   |   |   |   |--- weights: [25.98, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [42.31, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 8.00
|   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |--- arrival_date >  8.00
|   |   |   |   |   |--- weights: [5.20, 24.52] class: 1

In [263]:
# Checking important features in 'model_dtree5': 
feature_names = list(X_train.columns)
importances = model_dtree5.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
In [264]:
# Importance of features of model 'model_dtree5'
print (pd.DataFrame(model_dtree5.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                         Imp
lead_time                            0.49242
market_segment_type_Online           0.19092
no_of_special_requests               0.18448
avg_price_per_room                   0.04891
no_of_adults                         0.02426
no_of_weekend_nights                 0.02065
market_segment_type_Offline          0.01187
arrival_month                        0.01095
required_car_parking_space           0.00972
no_of_week_nights                    0.00520
arrival_date                         0.00062
room_type_reserved_Room_Type 6       0.00000
type_of_meal_plan_Meal Plan 3        0.00000
type_of_meal_plan_Meal Plan 2        0.00000
market_segment_type_Corporate        0.00000
market_segment_type_Complementary    0.00000
room_type_reserved_Room_Type 7       0.00000
room_type_reserved_Room_Type 4       0.00000
room_type_reserved_Room_Type 5       0.00000
arrival_year                         0.00000
room_type_reserved_Room_Type 3       0.00000
room_type_reserved_Room_Type 2       0.00000
no_of_children                       0.00000
no_of_previous_bookings_not_canceled 0.00000
no_of_previous_cancellations         0.00000
repeated_guest                       0.00000
type_of_meal_plan_Not Selected       0.00000

Observations on model_dtree5:

  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 6176 24.38
True Negative 14936 58.95
False Positive 2133 8.42
False Negative 2092 8.26
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2653 24.43
True Negative 6362 58.58
False Positive 883 8.13
False Negative 962 8.86
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.83325
Test 0.83011
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.74698
Test 0.73389
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.74513
Test 0.74199
  • As the max depth parameter has been set to 5, the model is a bit more complex as compared to model_dtree4.
  • This model has almost similar performances in terms of accuracy, recall and f1-score for both training and test dataset.
  • Also, this model has better performance in terms of accuracy, recall and F1-score as compared to the previous model.
  • The most important features for this model are 'lead_time', market_segment_type_Online', 'no_of_special_requests', 'avg_price_per_room', and 'no_of_adults'.
In [265]:
# Creating Decision Tree model (using Grid Search for Hyperparameter Tuning of the model):

# Choosing the type of classifier:
model_dtree2 = DecisionTreeClassifier(random_state=1, class_weight="balanced")

# Grid of parameters to choose from:
parameters = {
    "max_depth": np.arange(2, 7, 2),
    "max_leaf_nodes": [50, 75, 150, 250],
    "min_samples_split": [10, 30, 50, 70],
}

# Type of scoring used to compare parameter combinations:
acc_scorer = make_scorer(f1_score)

# Run the grid search:
grid_obj = GridSearchCV(model_dtree2, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)

# Set the clf to the best combination of parameters
model_dtree2 = grid_obj.best_estimator_

# Fit the best algorithm to the data.
model_dtree2.fit(X_train, y_train)
Out[265]:
DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=50,
                       min_samples_split=30, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [266]:
print('Accuracy on training set : ', round(model_dtree2.score(X_train, y_train),2))
print('Accuracy on test set : ', round(model_dtree2.score(X_test, y_test),2))
Accuracy on training set :  0.83
Accuracy on test set :  0.83

Observations:

  • The optimum values of hyper parameters after cross-validated grid-search are:
    • max_depth = 6
    • max_leaf_nodes = 50
    • min_samples_split = 30
  • The model is giving a generalized result now since the accuracy scores on both the train and test data are coming to be around 0.83 which shows that the model is able to generalize well on unseen data.
In [267]:
# Confusion matrix for model 'model_dtree2' on training data: 
confusion_matrix_sklearn(model_dtree2, X_train, y_train)
In [268]:
dtree_tune_perf_train = model_performance_classification_sklearn(model_dtree2, X_train, y_train)
print('Training data performance:')
dtree_tune_perf_train
Training data performance:
Out[268]:
Accuracy Recall Precision F1
0 0.82954 0.80056 0.71256 0.75400
In [269]:
# Confusion matrix for model 'model_dtree2' on test data: 
confusion_matrix_sklearn(model_dtree2, X_test, y_test)
In [270]:
dtree_tune_perf_test = model_performance_classification_sklearn(model_dtree2, X_test, y_test)
print('Testing data performance:')
dtree_tune_perf_test
Testing data performance:
Out[270]:
Accuracy Recall Precision F1
0 0.82735 0.79198 0.71826 0.75332

Observations on model_dtree2:

  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 6619 26.12
True Negative 14399 56.83
False Positive 2670 10.54
False Negative 1649 6.51
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 2863 26.36
True Negative 6122 56.37
False Positive 1123 10.34
False Negative 752 6.92
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.82954
Test 0.82735
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.80056
Test 0.79198
  • The F1 score of the model is as follows:
Type of Dataset F1 score
Train 0.75400
Test 0.75332
  • As, the depth of the tree has been set to 6, this model is a bit more complicated as compared to model_dtree5 (max_depth = 5). However, the F1- score on both test and training data is slightly better as compared to model_dtree5.
In [271]:
# Visualizing the Decision Tree 'model_dtree2':
plt.figure(figsize=(20, 10))
out = tree.plot_tree(model_dtree2,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [272]:
# Text report showing the rules of the decision tree model 'model_dtree2':
print(tree.export_text(model_dtree2, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 196.50
|   |   |   |   |   |   |--- weights: [1769.39, 133.30] class: 0
|   |   |   |   |   |--- avg_price_per_room >  196.50
|   |   |   |   |   |   |--- weights: [0.74, 24.52] class: 1
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |--- weights: [946.30, 226.77] class: 0
|   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |--- weights: [131.37, 157.82] class: 1
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |--- weights: [259.03, 505.64] class: 1
|   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |--- weights: [43.79, 6.13] class: 0
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |--- weights: [6.68, 32.18] class: 1
|   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |--- weights: [290.20, 96.53] class: 0
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 13.50
|   |   |   |   |--- lead_time <= 2.50
|   |   |   |   |   |--- avg_price_per_room <= 202.67
|   |   |   |   |   |   |--- weights: [411.92, 108.79] class: 0
|   |   |   |   |   |--- avg_price_per_room >  202.67
|   |   |   |   |   |   |--- weights: [0.74, 10.73] class: 1
|   |   |   |   |--- lead_time >  2.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- weights: [303.56, 402.98] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [100.20, 0.00] class: 0
|   |   |   |--- lead_time >  13.50
|   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 99.82
|   |   |   |   |   |   |--- weights: [531.41, 1231.92] class: 1
|   |   |   |   |   |--- avg_price_per_room >  99.82
|   |   |   |   |   |   |--- weights: [453.48, 2446.98] class: 1
|   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |--- no_of_week_nights <= 8.00
|   |   |   |   |   |   |--- weights: [48.98, 0.00] class: 0
|   |   |   |   |   |--- no_of_week_nights >  8.00
|   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- lead_time <= 91.50
|   |   |   |   |   |--- no_of_weekend_nights <= 2.50
|   |   |   |   |   |   |--- weights: [684.30, 6.13] class: 0
|   |   |   |   |   |--- no_of_weekend_nights >  2.50
|   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |   |   |--- lead_time >  91.50
|   |   |   |   |   |--- arrival_date <= 13.50
|   |   |   |   |   |   |--- weights: [48.24, 21.45] class: 0
|   |   |   |   |   |--- arrival_date >  13.50
|   |   |   |   |   |   |--- weights: [68.28, 4.60] class: 0
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- avg_price_per_room <= 158.43
|   |   |   |   |   |   |--- weights: [553.68, 50.56] class: 0
|   |   |   |   |   |--- avg_price_per_room >  158.43
|   |   |   |   |   |   |--- weights: [83.87, 21.45] class: 0
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- weights: [2593.97, 1411.19] class: 0
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [141.76, 1.53] class: 0
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 89.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1582.36, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- weights: [183.32, 75.08] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [48.24, 0.00] class: 0
|   |   |   |--- lead_time >  89.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- weights: [179.61, 59.76] class: 0
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- weights: [112.07, 104.19] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [58.63, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 27.82
|   |   |   |   |   |   |--- weights: [5.94, 4.60] class: 0
|   |   |   |   |   |--- avg_price_per_room >  27.82
|   |   |   |   |   |   |--- weights: [1.48, 96.53] class: 1
|   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- weights: [3.71, 27.58] class: 1
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- weights: [247.89, 70.48] class: 0
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- avg_price_per_room <= 82.47
|   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |--- weights: [221.17, 387.65] class: 1
|   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |--- weights: [1.48, 307.98] class: 1
|   |   |   |   |--- avg_price_per_room >  82.47
|   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |--- weights: [20.78, 1037.32] class: 1
|   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |--- weights: [6.68, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- arrival_date <= 22.50
|   |   |   |   |   |   |--- weights: [31.91, 0.00] class: 0
|   |   |   |   |   |--- arrival_date >  22.50
|   |   |   |   |   |   |--- weights: [14.10, 9.19] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |--- weights: [14.84, 4.60] class: 0
|   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |--- weights: [17.81, 185.40] class: 1
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |--- weights: [115.04, 3.06] class: 0
|   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |--- weights: [0.74, 1.53] class: 1
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- weights: [232.31, 107.26] class: 0
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [19.30, 33.71] class: 1
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 3223.82] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [26.72, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [42.31, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- weights: [8.16, 24.52] class: 1

In [273]:
#Checking important features in 'model_dtree2':
importances = model_dtree2.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
In [274]:
# Importance of features in model 'model_dtree2':
print (pd.DataFrame(model_dtree2.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                         Imp
lead_time                            0.47041
market_segment_type_Online           0.18819
no_of_special_requests               0.17246
avg_price_per_room                   0.06219
arrival_month                        0.02680
no_of_adults                         0.02454
no_of_weekend_nights                 0.01965
required_car_parking_space           0.01410
market_segment_type_Offline          0.01103
no_of_week_nights                    0.00931
arrival_date                         0.00133
arrival_year                         0.00000
room_type_reserved_Room_Type 6       0.00000
type_of_meal_plan_Meal Plan 3        0.00000
type_of_meal_plan_Meal Plan 2        0.00000
market_segment_type_Corporate        0.00000
market_segment_type_Complementary    0.00000
room_type_reserved_Room_Type 7       0.00000
room_type_reserved_Room_Type 4       0.00000
room_type_reserved_Room_Type 5       0.00000
room_type_reserved_Room_Type 3       0.00000
room_type_reserved_Room_Type 2       0.00000
no_of_children                       0.00000
no_of_previous_bookings_not_canceled 0.00000
no_of_previous_cancellations         0.00000
repeated_guest                       0.00000
type_of_meal_plan_Not Selected       0.00000

Observation:

  • The 5 most important features are lead_time, market_segment_type_online, no_of_special_requests, avg_price_per_room, and 'arrival_month'.

Post-Pruning

In [275]:
# Creating Decision Tree model (using Cost Complexity Pruning):
model_dtree3 = DecisionTreeClassifier(random_state=1, class_weight="balanced")
path = model_dtree3.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = abs(path.ccp_alphas), path.impurities
In [276]:
pd.DataFrame(path)
Out[276]:
ccp_alphas impurities
0 0.00000 0.00848
1 -0.00000 0.00848
2 0.00000 0.00848
3 0.00000 0.00848
4 0.00000 0.00848
... ... ...
1661 0.00911 0.32606
1662 0.00960 0.33566
1663 0.01255 0.34820
1664 0.03492 0.41804
1665 0.08196 0.50000

1666 rows × 2 columns

In [277]:
# Plotting Total Impurity vs effective alpha for training set: 
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
In [278]:
# Training decision tree using the effective alphas :
clfs = []
for ccp_alpha in ccp_alphas:
    model_dtree3 = DecisionTreeClassifier(
        random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced"
    )
    model_dtree3.fit(X_train, y_train)
    clfs.append(model_dtree3)
print(
    "Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
        clfs[-1].tree_.node_count, ccp_alphas[-1]
    )
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.0819566740575246
In [279]:
# Plotting No. of Nodes and Depth of the tree vs alpha:
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]

node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()

Observations:

  • Number of nodes in the last tree is 1 with ccp_alpha of almost 0.082
  • The plots show that both the depth of the tree and the number of nodes decreases with an increase in alpha.
In [280]:
# Plotting accuracy scores for training and test sets: 
accuracy_train = []
for clf in clfs:
    pred_train = clf.predict(X_train)
    values_train = accuracy_score(y_train, pred_train)
    accuracy_train.append(values_train)

accuracy_test = []
for clf in clfs:
    pred_test = clf.predict(X_test)
    values_test = accuracy_score(y_test, pred_test)
    accuracy_test.append(values_test)

fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Accuracy Score")
ax.set_title("Accuracy Score vs alpha for training and testing sets")
ax.plot(ccp_alphas, accuracy_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, accuracy_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
In [281]:
# Plotting recall scores for training and test sets: 
recall_train = []
for clf in clfs:
    pred_train = clf.predict(X_train)
    values_train = recall_score(y_train, pred_train)
    recall_train.append(values_train)

recall_test = []
for clf in clfs:
    pred_test = clf.predict(X_test)
    values_test = recall_score(y_test, pred_test)
    recall_test.append(values_test)

fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall Score")
ax.set_title("Recall Score vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
In [282]:
# Plotting F1 scores for training and test sets: 
f1_train = []
for clf in clfs:
    pred_train = clf.predict(X_train)
    values_train = f1_score(y_train, pred_train)
    f1_train.append(values_train)

f1_test = []
for clf in clfs:
    pred_test = clf.predict(X_test)
    values_test = f1_score(y_test, pred_test)
    f1_test.append(values_test)

fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("F1 Score")
ax.set_title("F1 Score vs alpha for training and testing sets")
ax.plot(ccp_alphas, f1_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, f1_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
In [283]:
# Creating the model where we get highest train and test F1 score:
index_best_model = np.argmax(f1_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=9.413172181156219e-05, class_weight='balanced',
                       random_state=1)
In [284]:
# Confusion matrix for model 'best_model' on training data: 
confusion_matrix_sklearn(best_model, X_train, y_train)
In [285]:
dtree_post_perf_train = model_performance_classification_sklearn(
    best_model, X_train, y_train
)
print('Training Performance')
dtree_post_perf_train
Training Performance
Out[285]:
Accuracy Recall Precision F1
0 0.91720 0.92900 0.83562 0.87984
In [286]:
# Confusion matrix for model 'best_model' on test data: 
confusion_matrix_sklearn(best_model, X_test, y_test)
In [287]:
dtree_post_perf_test = model_performance_classification_sklearn(
    best_model, X_test, y_test
)
print('Testing Performance')
dtree_post_perf_test
Testing Performance
Out[287]:
Accuracy Recall Precision F1
0 0.86510 0.85422 0.76701 0.80827

Observations on 'best_model':

  • The value of ccp_alpha for the best model from f1-score vs alpha plot is 0.000094.
  • The number for True Positive, True Negative, False Positive and False Negative observations on training data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 7681 30.32
True Negative 15558 61.40
False Positive 1511 5.96
False Negative 587 2.32
  • The number for True Positive, True Negative, False Positive and False Negative observations on testing data is as follows:
Type Number of Observations Percentage of Observations (%)
True Positive 3088 28.43
True Negative 6307 58.08
False Positive 938 8.64
False Negative 527 4.85
  • The accuracy score of the model is as follows:
Type of Dataset Accuracy score
Train 0.91720
Test 0.86510
  • The recall score of the model is as follows:
Type of Dataset Recall score
Train 0.92900
Test 0.85422
  • The F1-score of the model is as follows:
Type of Dataset F1-score
Train 0.87984
Test 0.80827
  • The model is performing better than model_dtree2(pre-pruning using grid search cv). F1-scores is much higher in both test and training data when compared to model_dtree2.
In [288]:
# Visualizing the Decision Tree for model 'best_model':
plt.figure(figsize=(20, 10))
out = tree.plot_tree(best_model,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [289]:
# Text report showing the rules of the decision tree model 'best_model':
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50
|   |--- no_of_special_requests <= 0.50
|   |   |--- market_segment_type_Online <= 0.50
|   |   |   |--- lead_time <= 90.50
|   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 196.50
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 16.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 162.53
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  162.53
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 3.06] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  16.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [23.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 13.00
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [9.65, 7.66] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [11.13, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  13.00
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |--- weights: [118.75, 1.53] class: 0
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- weights: [1238.72, 1.53] class: 0
|   |   |   |   |   |--- avg_price_per_room >  196.50
|   |   |   |   |   |   |--- weights: [0.74, 24.52] class: 1
|   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |--- lead_time <= 65.50
|   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 62.40
|   |   |   |   |   |   |   |   |--- arrival_date <= 20.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [37.11, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 3.06] class: 1
|   |   |   |   |   |   |   |   |--- arrival_date >  20.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 59.75
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 24.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  24.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [17.07, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  59.75
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 24.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  24.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 61.29] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  62.40
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 3.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 59.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |--- lead_time >  59.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 13.79] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [18.55, 3.06] class: 0
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  3.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.74, 16.85] class: 1
|   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |--- weights: [397.82, 22.98] class: 0
|   |   |   |   |   |--- lead_time >  65.50
|   |   |   |   |   |   |--- avg_price_per_room <= 99.98
|   |   |   |   |   |   |   |--- lead_time <= 82.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 80.38
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 78.03
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  78.03
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 10.73] class: 1
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  80.38
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 92.27
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [23.01, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  92.27
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |--- weights: [15.59, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  82.50
|   |   |   |   |   |   |   |   |--- weights: [31.17, 1.53] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  99.98
|   |   |   |   |   |   |   |--- lead_time <= 81.00
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 123.25
|   |   |   |   |   |   |   |   |   |--- lead_time <= 68.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.23, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  68.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.23, 107.26] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  123.25
|   |   |   |   |   |   |   |   |   |--- weights: [5.94, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  81.00
|   |   |   |   |   |   |   |   |--- lead_time <= 88.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.39, 1.53] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  88.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.74, 3.06] class: 1
|   |   |   |--- lead_time >  90.50
|   |   |   |   |--- lead_time <= 117.50
|   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |--- avg_price_per_room <= 91.22
|   |   |   |   |   |   |   |--- avg_price_per_room <= 75.07
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 60.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [8.16, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  60.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 98.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  98.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [18.55, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [12.62, 6.13] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- avg_price_per_room >  75.07
|   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 88.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [57.89, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  88.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 21.45] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.23, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [20.78, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |--- avg_price_per_room >  91.22
|   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 110.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [10.39, 12.26] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  110.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.97, 22.98] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 118.43
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 93.25
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  93.25
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [20.04, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  118.43
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.13] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 102.09
|   |   |   |   |   |   |   |   |   |--- weights: [6.68, 136.37] class: 1
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  102.09
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 109.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 16.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [29.69, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  109.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 78.14] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |--- weights: [33.40, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |--- weights: [10.39, 6.13] class: 0
|   |   |   |   |--- lead_time >  117.50
|   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |--- avg_price_per_room <= 92.50
|   |   |   |   |   |   |   |--- weights: [6.68, 4.60] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  92.50
|   |   |   |   |   |   |   |--- weights: [0.00, 27.58] class: 1
|   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- weights: [97.23, 1.53] class: 0
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- arrival_date <= 7.50
|   |   |   |   |   |   |   |   |   |--- weights: [47.50, 4.60] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  7.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 25.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [17.81, 6.13] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_date >  25.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [17.07, 1.53] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 125.00
|   |   |   |   |   |   |   |   |   |--- weights: [85.35, 7.66] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  125.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |--- market_segment_type_Online >  0.50
|   |   |   |--- lead_time <= 13.50
|   |   |   |   |--- lead_time <= 2.50
|   |   |   |   |   |--- avg_price_per_room <= 202.67
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 61.36
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [11.88, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.74, 18.39] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  61.36
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [44.53, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [66.06, 9.19] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  9.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 29.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  29.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [12.62, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- weights: [134.34, 13.79] class: 0
|   |   |   |   |   |--- avg_price_per_room >  202.67
|   |   |   |   |   |   |--- weights: [0.74, 10.73] class: 1
|   |   |   |   |--- lead_time >  2.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- avg_price_per_room <= 94.80
|   |   |   |   |   |   |   |--- avg_price_per_room <= 75.69
|   |   |   |   |   |   |   |   |--- weights: [52.70, 1.53] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  75.69
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [15.59, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 85.75
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  85.75
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [29.69, 6.13] class: 0
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 18.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  18.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.71, 19.92] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  94.80
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- weights: [35.63, 4.60] class: 0
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 204.86
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  204.86
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 16.85] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.94, 9.19] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.68, 81.21] class: 1
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 123.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 117.71
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  117.71
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  123.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [8.16, 26.05] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [100.20, 0.00] class: 0
|   |   |   |--- lead_time >  13.50
|   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 99.82
|   |   |   |   |   |   |--- lead_time <= 27.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [22.27, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [18.55, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 25.50
|   |   |   |   |   |   |   |   |   |--- weights: [35.63, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  25.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.48, 3.06] class: 1
|   |   |   |   |   |   |--- lead_time >  27.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 59.43
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 29.62
|   |   |   |   |   |   |   |   |   |--- weights: [15.59, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  29.62
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 10.00
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.45, 9.19] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.60] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  59.43
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 38.31] class: 1
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 61.87
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  61.87
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.71, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |--- avg_price_per_room >  99.82
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |--- weights: [15.59, 12.26] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |--- weights: [18.55, 0.00] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.74, 33.71] class: 1
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 195.12
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 193.20
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  193.20
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  195.12
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 130.24] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [20.04, 3.06] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.06] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 189.58
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  189.58
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.45, 1.53] class: 0
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 59.00
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 22.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [14.84, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_date >  22.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 3.06] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  59.00
|   |   |   |   |   |   |   |   |   |--- weights: [1.48, 9.19] class: 1
|   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |--- no_of_weekend_nights <= 3.00
|   |   |   |   |   |   |--- weights: [48.98, 0.00] class: 0
|   |   |   |   |   |--- no_of_weekend_nights >  3.00
|   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |--- no_of_special_requests >  0.50
|   |   |--- no_of_special_requests <= 1.50
|   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |--- lead_time <= 91.50
|   |   |   |   |   |--- no_of_weekend_nights <= 2.50
|   |   |   |   |   |   |--- weights: [684.30, 6.13] class: 0
|   |   |   |   |   |--- no_of_weekend_nights >  2.50
|   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |   |   |--- lead_time >  91.50
|   |   |   |   |   |--- arrival_date <= 13.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- lead_time <= 108.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 6.13] class: 1
|   |   |   |   |   |   |   |--- lead_time >  108.50
|   |   |   |   |   |   |   |   |--- weights: [2.23, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- lead_time <= 146.50
|   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_date <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 188.57
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [39.34, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  188.57
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_date >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.71, 4.60] class: 1
|   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.48, 3.06] class: 1
|   |   |   |   |   |   |   |--- lead_time >  146.50
|   |   |   |   |   |   |   |   |--- weights: [1.48, 4.60] class: 1
|   |   |   |   |   |--- arrival_date >  13.50
|   |   |   |   |   |   |--- weights: [68.28, 4.60] class: 0
|   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |--- avg_price_per_room <= 158.43
|   |   |   |   |   |   |--- no_of_weekend_nights <= 2.50
|   |   |   |   |   |   |   |--- arrival_date <= 12.50
|   |   |   |   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |   |   |   |--- weights: [196.68, 16.85] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [25.98, 15.32] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [18.55, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  12.50
|   |   |   |   |   |   |   |   |--- weights: [308.75, 15.32] class: 0
|   |   |   |   |   |   |--- no_of_weekend_nights >  2.50
|   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |--- weights: [3.71, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 3.06] class: 1
|   |   |   |   |   |--- avg_price_per_room >  158.43
|   |   |   |   |   |   |--- arrival_date <= 17.50
|   |   |   |   |   |   |   |--- arrival_date <= 16.50
|   |   |   |   |   |   |   |   |--- weights: [36.37, 9.19] class: 0
|   |   |   |   |   |   |   |--- arrival_date >  16.50
|   |   |   |   |   |   |   |   |--- weights: [2.23, 9.19] class: 1
|   |   |   |   |   |   |--- arrival_date >  17.50
|   |   |   |   |   |   |   |--- weights: [45.27, 3.06] class: 0
|   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- avg_price_per_room <= 118.55
|   |   |   |   |   |   |   |--- no_of_weekend_nights <= 2.50
|   |   |   |   |   |   |   |   |--- lead_time <= 61.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [69.77, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [133.59, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  61.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |--- no_of_weekend_nights >  2.50
|   |   |   |   |   |   |   |   |--- lead_time <= 108.50
|   |   |   |   |   |   |   |   |   |--- weights: [4.45, 30.64] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  108.50
|   |   |   |   |   |   |   |   |   |--- weights: [5.20, 0.00] class: 0
|   |   |   |   |   |   |--- avg_price_per_room >  118.55
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 138.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  138.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 12.26] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 127.85
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.45, 27.58] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  127.85
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [25.23, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 12
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 98.50
|   |   |   |   |   |   |   |   |   |--- weights: [57.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  98.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.48, 13.79] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |   |--- weights: [141.76, 0.00] class: 0
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |   |--- weights: [0.00, 1.53] class: 1
|   |   |--- no_of_special_requests >  1.50
|   |   |   |--- lead_time <= 89.50
|   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |--- weights: [1582.36, 0.00] class: 0
|   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |   |   |--- lead_time <= 8.50
|   |   |   |   |   |   |   |   |   |--- weights: [29.69, 3.06] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  8.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [14.84, 3.06] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [38.59, 7.66] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |   |   |--- weights: [0.00, 4.60] class: 1
|   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |--- weights: [32.66, 0.00] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [48.24, 0.00] class: 0
|   |   |   |--- lead_time >  89.50
|   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |--- avg_price_per_room <= 202.14
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- weights: [5.94, 16.85] class: 1
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- lead_time <= 150.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 141.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [152.89, 12.26] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  141.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.06] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  150.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 6.13] class: 1
|   |   |   |   |   |   |--- avg_price_per_room >  202.14
|   |   |   |   |   |   |   |--- weights: [0.00, 13.79] class: 1
|   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |--- arrival_date <= 10.50
|   |   |   |   |   |   |   |   |--- weights: [2.23, 3.06] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  10.50
|   |   |   |   |   |   |   |   |--- weights: [14.10, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 90.78
|   |   |   |   |   |   |   |   |--- lead_time <= 107.00
|   |   |   |   |   |   |   |   |   |--- weights: [5.20, 26.05] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  107.00
|   |   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [5.20, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 82.88
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 10.73] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  82.88
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.20, 1.53] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  90.78
|   |   |   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [7.42, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 158.86
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 92.60
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  92.60
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  158.86
|   |   |   |   |   |   |   |   |   |   |--- weights: [12.62, 3.06] class: 0
|   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |--- weights: [58.63, 0.00] class: 0
|--- lead_time >  151.50
|   |--- avg_price_per_room <= 100.04
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |--- avg_price_per_room <= 27.82
|   |   |   |   |   |   |--- weights: [5.94, 4.60] class: 0
|   |   |   |   |   |--- avg_price_per_room >  27.82
|   |   |   |   |   |   |--- weights: [1.48, 96.53] class: 1
|   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |--- lead_time <= 163.50
|   |   |   |   |   |   |--- arrival_month <= 5.00
|   |   |   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  5.00
|   |   |   |   |   |   |   |--- weights: [0.74, 27.58] class: 1
|   |   |   |   |   |--- lead_time >  163.50
|   |   |   |   |   |   |--- lead_time <= 347.50
|   |   |   |   |   |   |   |--- lead_time <= 173.00
|   |   |   |   |   |   |   |   |--- arrival_date <= 23.00
|   |   |   |   |   |   |   |   |   |--- weights: [43.79, 13.79] class: 0
|   |   |   |   |   |   |   |   |--- arrival_date >  23.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 12.26] class: 1
|   |   |   |   |   |   |   |--- lead_time >  173.00
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 98.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 88.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.45, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  88.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.60] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  5.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 340.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  340.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.45, 3.06] class: 0
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  98.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.74, 7.66] class: 1
|   |   |   |   |   |   |--- lead_time >  347.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 88.00
|   |   |   |   |   |   |   |   |--- weights: [0.74, 15.32] class: 1
|   |   |   |   |   |   |   |--- avg_price_per_room >  88.00
|   |   |   |   |   |   |   |   |--- weights: [5.94, 6.13] class: 1
|   |   |   |--- no_of_adults >  1.50
|   |   |   |   |--- avg_price_per_room <= 82.47
|   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |--- lead_time <= 170.50
|   |   |   |   |   |   |   |--- arrival_date <= 5.50
|   |   |   |   |   |   |   |   |--- lead_time <= 165.00
|   |   |   |   |   |   |   |   |   |--- weights: [3.71, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  165.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.06] class: 1
|   |   |   |   |   |   |   |--- arrival_date >  5.50
|   |   |   |   |   |   |   |   |--- weights: [46.02, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  170.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 77.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  77.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_date <= 13.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.74, 9.19] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_date >  13.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [13.36, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 272.00
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 174.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 21.45] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  174.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- lead_time >  272.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [8.91, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- weights: [24.49, 0.00] class: 0
|   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |--- weights: [1.48, 307.98] class: 1
|   |   |   |   |--- avg_price_per_room >  82.47
|   |   |   |   |   |--- no_of_adults <= 2.50
|   |   |   |   |   |   |--- lead_time <= 325.50
|   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.39, 1000.55] class: 1
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.71, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 21.45] class: 1
|   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.74, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  325.50
|   |   |   |   |   |   |   |--- arrival_year <= 2017.50
|   |   |   |   |   |   |   |   |--- weights: [5.20, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_year >  2017.50
|   |   |   |   |   |   |   |   |--- weights: [0.74, 15.32] class: 1
|   |   |   |   |   |--- no_of_adults >  2.50
|   |   |   |   |   |   |--- weights: [6.68, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_weekend_nights <= 0.50
|   |   |   |   |--- lead_time <= 180.50
|   |   |   |   |   |--- arrival_date <= 22.50
|   |   |   |   |   |   |--- weights: [31.91, 0.00] class: 0
|   |   |   |   |   |--- arrival_date >  22.50
|   |   |   |   |   |   |--- no_of_adults <= 1.50
|   |   |   |   |   |   |   |--- weights: [0.00, 3.06] class: 1
|   |   |   |   |   |   |--- no_of_adults >  1.50
|   |   |   |   |   |   |   |--- weights: [14.10, 6.13] class: 0
|   |   |   |   |--- lead_time >  180.50
|   |   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |   |--- weights: [14.84, 4.60] class: 0
|   |   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 167.01] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 299.50
|   |   |   |   |   |   |   |   |   |--- weights: [8.16, 9.19] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  299.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 9.19] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [9.65, 0.00] class: 0
|   |   |   |--- no_of_weekend_nights >  0.50
|   |   |   |   |--- market_segment_type_Online <= 0.50
|   |   |   |   |   |--- weights: [115.78, 4.60] class: 0
|   |   |   |   |--- market_segment_type_Online >  0.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |--- avg_price_per_room <= 76.21
|   |   |   |   |   |   |   |   |--- weights: [45.27, 3.06] class: 0
|   |   |   |   |   |   |   |--- avg_price_per_room >  76.21
|   |   |   |   |   |   |   |   |--- avg_price_per_room <= 96.56
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 165.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  165.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- avg_price_per_room >  96.56
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [11.88, 1.53] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room <= 99.33
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.48, 13.79] class: 1
|   |   |   |   |   |   |   |   |   |   |--- avg_price_per_room >  99.33
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [12.62, 10.73] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |--- weights: [4.45, 13.79] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- weights: [16.33, 33.71] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |--- avg_price_per_room >  100.04
|   |   |--- arrival_month <= 11.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- weights: [0.00, 3223.82] class: 1
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [26.72, 0.00] class: 0
|   |   |--- arrival_month >  11.50
|   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |--- weights: [42.31, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |--- arrival_date <= 8.00
|   |   |   |   |   |--- weights: [2.97, 0.00] class: 0
|   |   |   |   |--- arrival_date >  8.00
|   |   |   |   |   |--- lead_time <= 168.00
|   |   |   |   |   |   |--- weights: [2.23, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  168.00
|   |   |   |   |   |   |--- weights: [2.97, 24.52] class: 1

In [290]:
# Checking important features in 'best_model': 
feature_names = list(X_train.columns)
importances = best_model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
In [291]:
# Importance of features in the tree building for model 'best_model':
print (pd.DataFrame(best_model.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                         Imp
lead_time                            0.38609
avg_price_per_room                   0.12678
market_segment_type_Online           0.12637
no_of_special_requests               0.11785
arrival_month                        0.06516
arrival_date                         0.04414
no_of_weekend_nights                 0.02901
no_of_adults                         0.02386
no_of_week_nights                    0.02303
arrival_year                         0.01543
market_segment_type_Offline          0.01137
required_car_parking_space           0.00989
type_of_meal_plan_Meal Plan 2        0.00571
type_of_meal_plan_Not Selected       0.00555
room_type_reserved_Room_Type 4       0.00377
repeated_guest                       0.00192
no_of_children                       0.00130
room_type_reserved_Room_Type 5       0.00107
room_type_reserved_Room_Type 2       0.00098
room_type_reserved_Room_Type 7       0.00032
no_of_previous_cancellations         0.00028
market_segment_type_Corporate        0.00015
room_type_reserved_Room_Type 3       0.00000
room_type_reserved_Room_Type 6       0.00000
market_segment_type_Complementary    0.00000
no_of_previous_bookings_not_canceled 0.00000
type_of_meal_plan_Meal Plan 3        0.00000

Observations:

  • The 5 most important features are lead_time, avg_price_per_room, market_segment_type_Online, no_of_special_requests, and arrival_month.

Model performance evaluation

In [292]:
# Training performance comparison
models_train_comp_df = pd.concat(
    [
        dtree_perf_train.T,
        dtree_tune3_perf_train.T,
        dtree_tune5_perf_train.T,
        dtree_tune_perf_train.T,
        dtree_post_perf_train.T
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Decision Tree sklearn",
    "Decision Tree (Pre-Pruning, max_depth=3)",
    'Decision Tree (Pre-Pruning, max_depth=5)',
    'Decision Tree (Pre-Pruning using GridSeacrchCV)',
    "Decision Tree (Post-Pruning)"
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
Out[292]:
Decision Tree sklearn Decision Tree (Pre-Pruning, max_depth=3) Decision Tree (Pre-Pruning, max_depth=5) Decision Tree (Pre-Pruning using GridSeacrchCV) Decision Tree (Post-Pruning)
Accuracy 0.99420 0.78786 0.83325 0.82954 0.91720
Recall 0.98670 0.73875 0.74698 0.80056 0.92900
Precision 0.99549 0.65515 0.74329 0.71256 0.83562
F1 0.99107 0.69445 0.74513 0.75400 0.87984
In [293]:
# Testing performance comparison
models_test_comp_df = pd.concat(
    [
        dtree_perf_test.T,
        dtree_tune3_perf_test.T,
        dtree_tune5_perf_test.T,
        dtree_tune_perf_test.T,
        dtree_post_perf_test.T
    ],
    axis=1,
)
models_test_comp_df.columns = [
    "Decision Tree sklearn",
    "Decision Tree (Pre-Pruning, max_depth=3)",
    'Decision Tree (Pre-Pruning, max_depth=5)',
    'Decision Tree (Pre-Pruning using GridSeacrchCV)',
    "Decision Tree (Post-Pruning)"
]
print("Testing performance comparison:")
models_test_comp_df
Testing performance comparison:
Out[293]:
Decision Tree sklearn Decision Tree (Pre-Pruning, max_depth=3) Decision Tree (Pre-Pruning, max_depth=5) Decision Tree (Pre-Pruning using GridSeacrchCV) Decision Tree (Post-Pruning)
Accuracy 0.86676 0.78425 0.83011 0.82735 0.86510
Recall 0.79751 0.72172 0.73389 0.79198 0.85422
Precision 0.80128 0.66118 0.75028 0.71826 0.76701
F1 0.79939 0.69012 0.74199 0.75332 0.80827

Observations:

  • If the sole criteria of the model is to maximise F1-Score, we should use the decision tree model that is named as 'best_model'. Using this model, we get the highest value of F1- score (of almost 80.83%) for test data. This model gives us a F1-score of 87.98% for training data.
  • The model named 'model_tree2'(i.e, the one for which the hyperparameters were tuned using Grid Search CV) gives us a high F1-score of 75.33% on test data and 75.4% on training data. This model has the second best F1-score and can also be considered a good model.
  • 'lead_time' is the most important important feature among all the models.
  • 'Other important features that affect a reservation being canceled (not necessarily in this order in all models analyzed here) are 'market_segment_type_Online', 'avg_price_per_room' and 'no_of_special_requests'.

Actionable Insights and Recommendations

Conclusions from Logistic Regression model:

  • All the models are giving a generalized performance on training and test set.
  • The highest F1- score is 70.39% on the training set and 69.52% on the test set.
  • Using the model with a 0.38 threshold, the model will give a high f1-score which results in higher chances of minimizing False Negatives and False Positives - This model will help the hotel to maintain a balance in the bearing of the additional costs of its distribution channels and providing satisfactory services to the customers.
  • Coefficient of no_of_children, no_of_weekend_nights, no_of_week_nights, lead time, arrival_year, no_of_previous_cancellations, avg_price_per_room and some levels of type_of_meal_plan are positive: An increase in these will lead to increase in chances of cancelation of a reservation.
  • Coefficient of required_car_parking_space, arrival_month, repeated_guest, no_of_special_requests, some levels of room_type_reserved, and some levels of market_type_segment are negative: An increase in these will lead to a decrease in chances of cancelation of a reservation.

Conclusions from Decision Tree model:

  • The highest F1- score is 87.98% on the training set and 80.83% on the test set.
  • Using the model named 'best_model', we get the highest value of F1- score for both training and test data.
  • The model named 'model_tree2' us a high F1-score of almost 75% on both test and training data. This model has the second best F1-score and can also be considered a good model.
  • The most important feature that affects cancelation of a reservation is 'lead_time'.
  • Other important features that affect a reservation being canceled (not necessarily in this order in all models analyzed here) are 'market_segment_type_Online', 'avg_price_per_room' and 'no_of_special_requests'.

Actionable Insights and Recommendations

  • What profitable policies for cancellations and refunds can the hotel adopt?
  • What other recommedations would you suggest to the hotel?

Actionable Insights:

  • The percentage of cancellation is higher if
      *   The duration between the booking date of a reservation and arrival date is high (Roughly about by 4.56 months).
      *   The price of the room is on the higher end.
      *   The number of special requests made by the guests while making a reservation is low. 
      *   The reservation is made via Online platforms.
    
    Inn Hotel Group needs to investigate the steps to be taken in case the booking has the characteristics mentioned above, so as to reduce the chances of a no-show or cancelling the reservation in order to maximise revenue/profits.
  • October, September, and August are the most popular months the customers. Hence, INN hotel groups should analyze if they could offer attractive prices of rooms during these peak months/ seasons so that a customer chooses them over their competitors, which in turn leads to higher profit.
  • Also, the hotel chain could look into offering promotional discounts and offers so that more guests book their stay with them and also provide recommendations to other potential customers.
  • The hotel chain could also further analyze which factors that are more desirable to the customers so that the cancellation rates further decrease, for example, room type 1 seems to be most popular among customers or the percentage of cancelation is less if a guest makes a higher number of special requests.
  • INN Hotel Group needs to further analyze the reasons and steps to be taken to mitigate the higher percentages of cancelations among the Online, offline, and Aviation sectors.
  • As the chances of cancelation of a reservation are less in the case of repeated guests, INN Hotel Group needs to further analyze the factors which lead a customer to choose them for their future stay. Also, the hotel can assess the reasons for different ratings so as better the customer service provided which mad lead to a higer chance of a guest to repeatedly book reservations with them.
  • As Online market has the highest reservation percentage, Inn hotel group needs to further analyze the steps to be taken in order to increase the chances of securing a reservation and convert a customer to a repeated guest.
  • Also rooms booked in advance by around 4.56 months are most likely to be canceled. The hotel could place a no cancellation policy in such cases ahead of making the reservation.

Recommendations:

  • It is important to make sure guests can see the cancelation policy of the hotel during the booking process so they completely understand the terms they are booking under. This ensures that the guests are aware that they have to pay the cancellation fee, if a cancellation happens after the deadline so that INN Hotel Group does not lose out on revenue.
  • INN Hotel Group can look into offering guests a discount to pay for their trip upfront. This way the guest gets a discounted room and the hotel reduces the chances of no-shows.
  • INN Hotel Group can look into sending a reminder to guests about an upcoming reservation which may lead to a decrease in the chances of a no-show. Also, doing so might build up their enthusiasm toward the level of service provided.
  • INN Hotel Group could further analyze a customer's repeated behavior of no-show so that they could explain why there is a no-show fee.

  • INN Hotels Group should analyze if their website allows for easier navigation to new customers so that they can easily book their stay, especially during peak seasons.

  • As a large percentage of the booking come from their online platform, INN Hotels Group should analyze if it is easy for repeated customers to rate their stay. Higher ratings may lead to new customers booking their stay which in turn increases revenue/profit.
  • INN Hotels Group should further analyze the reasons for different ratings in order to improve their customer satisfaction rates.
  • A satisfied customer might provide recommendations which would help in bringing in more customers and hence increasing profits.
  • INN Hotels Group should further analyze what are the optimum values of price of the rooms per day during peak holidays.